Member-only story
You Can Now Chat With Your Images Using AI — Here’s How
By now, you might have become acquainted with artificial intelligence (AI) image generators such as Dall-E2, MidJourney, or Stable Diffusion. The likelihood is also high that you’ve interacted with AI chatbots like ChatGPT, Bard, and Bing Chat.
But have you ever thought about the possibility of an AI tool that not only interacts with you but also allows you to communicate with your images?
That’s the promise of LLaVA, which is an AI tool where users can upload images and engage in conversations about them.
Here’s an example:

What is LLaVA?
LLava, or Large Language-and-Vision Assistant, is an innovative AI model that demonstrates new heights in multimodal understanding by combining the capabilities of image interpretation and conversational interaction.

Drawing on the strengths of the pre-trained CLIP ViT-L/14 visual encoder and the large language model Vicuna, LLaVa uses a projection matrix to connect these two components.
Try it yourself
You can use Llava by going to this Gradio Web app.
Upload your image, and then you can ask the AI questions about the image. For example, you could ask, “What is this image about?” LLaVA will then respond to your question with a text answer.
Here’s an example:

You’ll be amazed at how accurately it grasps the content of the image. It can even identify well-known art pieces such as the Mona Lisa, demonstrating an impressive understanding of the context of images.