Modalities

Diffusion Model

The type of AI model behind most modern image and video generators — it learns to create content by reversing a noising process.

01 ——

In plain English

A diffusion model is a generative model that learns to produce images, video, or audio by reversing a process of gradually adding noise. During training, the model sees real images progressively corrupted with random noise. It then learns to reverse the corruption — denoising step by step until a coherent image emerges.

Where you'll see them:

Image generation — DALL-E, Midjourney, Stable Diffusion, Flux
Video generation — Sora, Runway, Pika
Audio generation — music and sound effect generators
3D and design — emerging applications

Why diffusion replaced GANs: Earlier image generators used GANs (Generative Adversarial Networks), but they were unstable and hard to train. Diffusion models are more reliable and produce higher-quality, more controllable outputs — which is why almost every major image generator now uses them.

02 ——