Member-only story

You Can Now Chat With Your Images Using AI — Here’s How

Jim Clyde Monge
4 min readMay 25, 2023
This AI Allows You To Chat With Your Images
Image by Jim Clyde Monge. Generated with MidJourney AI

By now, you might have become acquainted with artificial intelligence (AI) image generators such as Dall-E2, MidJourney, or Stable Diffusion. The likelihood is also high that you’ve interacted with AI chatbots like ChatGPT, Bard, and Bing Chat.

But have you ever thought about the possibility of an AI tool that not only interacts with you but also allows you to communicate with your images?

That’s the promise of LLaVA, which is an AI tool where users can upload images and engage in conversations about them.

Here’s an example:

Llava Visual Instruction Tuning
Llava Visual Instruction Tuning

What is LLaVA?

LLava, or Large Language-and-Vision Assistant, is an innovative AI model that demonstrates new heights in multimodal understanding by combining the capabilities of image interpretation and conversational interaction.

LLaVA: Large Language-and-Vision Assistant
LLaVA: Large Language-and-Vision Assistant

Drawing on the strengths of the pre-trained CLIP ViT-L/14 visual encoder and the large language model Vicuna, LLaVa uses a projection matrix to connect these two components.

Try it yourself

You can use Llava by going to this Gradio Web app.

Upload your image, and then you can ask the AI questions about the image. For example, you could ask, “What is this image about?” LLaVA will then respond to your question with a text answer.

Here’s an example:

LLaVA: Large Language and Vision Assistant
LLaVA: Large Language and Vision Assistant

You’ll be amazed at how accurately it grasps the content of the image. It can even identify well-known art pieces such as the Mona Lisa, demonstrating an impressive understanding of the context of images.

--

--

Responses (8)

Write a response