Speech-to-Text
AI that converts spoken audio into written text — the technology behind voice assistants, transcription tools, and meeting recorders.
In plain English
Speech-to-text — also called Automatic Speech Recognition (ASR) — is AI that turns recorded or live audio of human speech into written text. It powers transcription services, voice assistants, captioning, and voice control.
Common applications:
- Meeting transcripts — Otter, Fireflies, Granola
- Voice assistants — Siri, Alexa, ChatGPT voice
- Subtitles & captions — YouTube, Zoom, podcasts
- Voice typing — dictation in docs, emails, code editors
- Call analytics — sales call coaching, support QA
Modern ASR: OpenAI's Whisper transformed the field in 2022 — open-source, multilingual, and highly accurate. Most modern transcription products either use Whisper or one of its competitors (AssemblyAI, Deepgram, Google's Chirp).
Quality is now near-human for clean English audio; accents, multiple speakers, and noisy environments remain harder.