AI Image Generation

Diffusion Model

Definition

A diffusion model is a generative AI architecture that creates images, video, or audio by learning to reverse a gradual noising process. Starting from pure random noise, the model iteratively denoises the data until a coherent output emerges. Diffusion models power leading image generators like Midjourney, DALL-E 3, and Stable Diffusion.

How It Works

During training, the model learns to predict and remove noise added to real data samples at various intensity levels. At generation time, it starts with Gaussian noise and applies the learned denoising process over many steps, guided by a text prompt encoded via CLIP or T5. Classifier-free guidance scales the influence of the text condition, balancing prompt adherence with output diversity.

Key Tools

MidjourneyAI image generation with exceptional artistic quality

$10/mo

DALL-E (OpenAI)AI system that creates images from natural language descriptions

$20/mo (ChatGPT Plus) / API from $0.04/image

Stable Diffusion (Stability AI)Open-source image generation model for creative workflows

Free (open source); API from $0.01/image

Flux (Black Forest Labs)Next-generation open image model with exceptional prompt adherence

Free (open source); API usage-based

SoraAI model that creates realistic video from text prompts

$20/mo (ChatGPT Plus)

Related Terms

Text-to-Video Transformer

← Back to AI Glossary