Modalities

Text-to-Speech

AI that converts written text into natural-sounding spoken audio — used for narration, accessibility, voice assistants, and content creation.

01 ——

In plain English

Text-to-speech (TTS) is AI that turns written text into spoken audio. Modern TTS produces voices that are nearly indistinguishable from real humans — with control over tone, accent, emotion, and pacing.

Common uses:

  • Audiobooks and narration — narrate articles, books, courses
  • Accessibility — screen readers for blind and low-vision users
  • Voice agents — give chatbots a voice
  • Localisation — voice-over videos in dozens of languages
  • IVR / phone systems — replace the old robotic phone-tree voices

Leading TTS providers:

  • ElevenLabs — high-quality voice cloning and synthesis
  • OpenAI TTS, Google Cloud TTS, Azure Speech
  • Resemble, PlayHT, Murf — production-focused tools

Combined with speech-to-text and an LLM, TTS enables full voice-conversational AI products.

02 ——

Related terms

Back to glossaryLast reviewed May 2026
Vol. 4 · Issue 19 · Last reviewed 2026-05-30

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI