Modalities

Diffusion Model

The type of AI model behind most modern image and video generators — it learns to create content by reversing a noising process.

01 ——

In plain English

A diffusion model is a generative model that learns to produce images, video, or audio by reversing a process of gradually adding noise. During training, the model sees real images progressively corrupted with random noise. It then learns to reverse the corruption — denoising step by step until a coherent image emerges.

Where you'll see them:

  • Image generation — DALL-E, Midjourney, Stable Diffusion, Flux
  • Video generation — Sora, Runway, Pika
  • Audio generation — music and sound effect generators
  • 3D and design — emerging applications

Why diffusion replaced GANs: Earlier image generators used GANs (Generative Adversarial Networks), but they were unstable and hard to train. Diffusion models are more reliable and produce higher-quality, more controllable outputs — which is why almost every major image generator now uses them.

02 ——

Related terms

Back to glossaryLast reviewed May 2026
Vol. 4 · Issue 19 · Last reviewed 2026-05-30

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI