Multimodal Content Publishing: Apartment Inspection Reports with Graphlit and GPT-4 Vision
January 24, 2024
Multimodal Content Publishing
One potential use case is inspection reports of apartments. Often, an apartment manager will take move-in and move-out photos of an apartment for their records.
We can use the power of the GPT-4 Vision model to analyze these images, and use Graphlit's content publishing capabilities to automatically generate a written inspection report for the apartment manager.
Together, Graphlit and GPT-4 Vision support automated workflows for multimodal unstructured data, which can easily be incorporated into any vertical AI application. Graphlit enables capabilities with just a few API calls, which would takes developers weeks if not months to build themselves.
First, we need to create an LLM specification for the publishing process. We are using the GPT-4 Turbo 128k model to get the best output quality for the inspection report. Also, we are setting the
temperature to 0.5, which gives a nice creative balance. You can adjust these parameters to your liking, for your specific use case.
Next, in order to analyze images with the GPT-4 Vision model, we need to opt-in to image analysis by assigning
We are selecting the
HIGH detail level from the GPT-4 Vision model, which gives the most detailed description of images. It does incur a higher credit usage, but the quality difference is noticeable.
Note, the specifications and workflows can be reused across other use cases, and don't need to be created each time you are publishing content.
Apartment Inspection Images
We've created a feed to ingest a set of images taken by the apartment manager. We won't go into the details here, but you can read more about creating a feed in the Graphlit documentation.
Here is an example description of one of the images, using GPT-4 Vision, and our default Graphlit image analysis prompt.
Publishing From Image Descriptions
Now that we have the images ingested into Graphlit, with their descriptions generated by the GPT-4 Vision model, we can leverage the
publishContents mutation to summarize and publish our inspection report as a new content item.
Here we are telling the model to generate Markdown content, but you can generate text, Markdown or HTML with this publishing process. Also, you will need to specify the
publishSpecification that we created above.
We used this prompt to generate the inspection report for our apartment manager, but any publishing prompt can be used for your needs.
Apartment Inspection Report
You can see the level of detail that is enabled with the GPT-4 Vision model, and how it identified the white electric stove, the hexogonal tiles in the bathroom, and even called out how the colored border in the bathroom doesn't match the overall minimalist style of the apartment.
You can imagine an apartment manager using this publishing method to generate copy for a real estate website, or generating a move-out report, which identifies potential damage.
The possibilities are limitless, by incorporating multimodal models like GPT-4 Vision, with the content publishing capabilities of Graphlit.
Please email any questions on this tutorial or the Graphlit Platform to firstname.lastname@example.org.