Voice Agent
A real-time conversational AI you talk to — over the phone, in an app, or through a wearable — that listens, reasons, and replies in voice.
In plain English
A voice agent is an AI agent whose primary interface is spoken conversation. It listens to the user (speech-to-text), reasons about a response (LLM), and speaks back (text-to-speech), usually with low enough latency to feel like a real conversation.
What makes a good voice agent:
- Low latency — under 800ms turn-taking feels natural; over 1.5s feels broken
- Interruption handling — barge-in support, no awkward step-on-each-other
- Persona consistency — the voice, tone, and personality stay coherent
- Tool use — book appointments, look up orders, transfer to a human
- Memory — recognises returning callers and remembers context
Where they're deployed:
- Customer service — Sierra, Decagon, Parloa, Cresta
- Outbound sales — Air, Bland
- Healthcare — Hippocratic AI, Suki
- Real estate / scheduling — Lindy, Goodcall, Synthflow
- Developer infrastructure — Vapi, Retell, ElevenLabs Conversational AI
Voice agents are eating phone trees. The bar for what counts as "the human alternative" has risen dramatically since 2024.