Modalities

Text-to-Video

AI that generates video clips from a text description — the next frontier after text-to-image, with rapidly improving quality.

01 ——

In plain English

Text-to-video is a class of generative AI that produces video clips from a written prompt. It's an extension of text-to-image into the time dimension — significantly harder, but advancing fast.

Leading text-to-video tools:

  • OpenAI Sora — long, photorealistic clips
  • Runway Gen-3 / Gen-4 — popular for filmmakers
  • Pika — short clips, easy to use
  • Luma Dream Machine — quality on par with Runway
  • Google Veo — high-resolution, high-fidelity

Current limits:

  • Length — most tools cap at 5–30 seconds per clip
  • Consistency — characters and objects can morph between frames
  • Physics — bodies can warp, objects can pass through each other
  • Cost — video generation is much more compute-intensive than images

Despite the limits, text-to-video is already used in advertising, music videos, and short-form social content.

02 ——

Related terms

Back to glossaryLast reviewed May 2026
Vol. 4 · Issue 19 · Last reviewed 2026-05-30

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI