Dall-E2 VS Stable Diffusion: Same Prompt, Different Results

Jim Clyde Monge
4 min readAug 25, 2022
Dall-E2 vs Stable Diffusion AI-generated image. Portrait of a girl in a coffeeshop, reading a book, dramatic lighting
Image by Jim Clyde Monge

When it comes to text-to-image AI tools, there are many options that are already accessible right now. But there are two that stand out: Dall-E2 and Stable Diffusion.

Both have their own unique strengths and weaknesses, but which one is the better tool?

In this story, I will compare the image results of the two AI tools using a similar text prompt. But let’s define them first.

What is Dall-E2?

Dall-E2 is an artificial intelligence program that creates images from textual descriptions, revealed by OpenAI on January 5, 2021.

It uses a 12-billion parameter training version of the GPT-3 transformer model to interpret the natural language inputs and generate corresponding images.

What is Stable Diffusion?

Stable Diffusion is a text-to-image model that employs a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts, much like Google’s Imagen does.

Stable Diffusion separates the image generating process into a “diffusion” process at runtime. Starting with only noise, it gradually improves an image until there is no noise left at all, bringing it more and closer to a provided text description.

--

--