Voice AI — speech-to-text, text-to-speech, voice agents, and real-time transcription. Tools that listen, speak, and hold conversations.







Explore advanced text-to-speech and voice cloning software for lifelike voiceovers and content generation.

Conversational AI that runs sales, support, and scheduling phone calls at scale.

Real-time voice AI platform with low-latency speech, cloning, and TTS APIs.

Developer platform to build, test, and deploy advanced voice agents in minutes.

World's most lifelike voice AI agents for enterprise — PolyAI deflects up to 80% of transactional calls without escalation. Banks, airlines, hospitality use it.

End-to-end voice AI platform for enterprise call automation — $20M Series A (Accel), 1000+ customers, 45M calls handled, 99.9% uptime.

Automotive AI assistant platform — in-car voice AI deployed in 500M+ vehicles globally. Public ($CRNC). Spun out from Nuance.

AI phone agents that answer every call, book appointments, and handle customer support 24/7 — trusted by 10,000+ businesses, YC-backed, fast setup.
Fast, lifelike, affordable AI speech — studio-quality voice clones with 150ms latency. 24 languages. The TTS pick for cost-sensitive voice agents.

Open-source Python framework for real-time voice and multimodal conversational agents — by Daily, the WebRTC infrastructure leader. Most-used voice agent OSS.

Advanced AI Speech-to-Text and Voice Recognition Solutions

AI medical scribe automating clinical documentation across 50+ specialties. Used by 800+ healthcare orgs; saves 2-3 hours per clinician per day.

AI copilot for clinicians — ambient scribe + clinical assistant. European leader; deployed across The Permanente Medical Group's 24,000 clinicians.

Build advanced conversational voice AI with rapid response times.

Realistic conversational TTS designed specifically for voice agents and contact centers.

Meeting recording infrastructure API for AI products. Powers Otter, Granola, Read.ai, and Fathom under the hood. Sequoia-backed.

Voicemod is the real-time AI voice changer used by streamers, gamers, and creators. AI voice cloning, soundboard, text-to-speech. 30M+ users.

AI Phone Agent and Virtual Receptionist for service businesses — 3rd-gen platform with 300ms latency, 100% accuracy. Real estate, home services, contact centers.

Enterprise-grade speech-to-text and voice AI APIs from Rev — best-in-class English accuracy.

GTM AI agents for chat, email, voice & SMS — automated demos, bi-directional Salesforce sync.

Build, deploy, and scale hyperrealistic voice AI agents.

AI voice agent for sales, support, and customer engagement.

Customizable AI Sales Rep co-designed with top sales leaders — handles outreach, conversations, meeting booking. Built on Relevance AI, behavioral-data-driven personalization.

Low-code platform for ultra-realistic Voice AI Employees — receptionist agents in 3 minutes from a website URL. AI call center with 100+ concurrent calls.

AI voice agents purpose-built for compliant consumer lending — Taylor handles inbound and outbound across welcome, verification, payments, hardship, and collections.

Camb.ai is a multilingual voice cloning and translation platform supporting 140+ languages. Used by content creators, dubbing studios, and global brands.

AI-powered phone agent for missed call management.

Online text to speech converter with natural voices.

Enterprise voice AI platform with studio-quality narrated avatars. Used by Coursera, BambooHR, McKinsey for training, marketing, and product.

AI-driven platform to convert text into podcast-style audio.

Real-time AI sales coach inside the call — surfaces battlecards, objection handlers, and next-best-asks live during Zoom/Teams meetings.

AI-powered platform for conversational intelligence and voice automation.

Regal is the AI voice agent platform for sales and customer engagement. $40M+ raised, Emergence Capital-led. Used by high-velocity B2C revenue teams.

AI-powered cloud phone with voice clarity, transcription, and call routing. Virtual numbers in 100+ countries; Krisp noise cancellation built in.

Level AI is the generative AI platform for contact centers — automated QA, real-time agent assist, and call summarization. ~$65M Series C; Battery Ventures

24/7 voice AI service for restaurants to handle calls.

Real-time voice AI infrastructure. Top 3 Product of the Day. Sub-second latency for production voice agents at scale.

Boson AI is an audio foundation model company that builds the Higgs Audio models for text-to-speech, speech-to-text and audio understanding.

AI-powered voice assistant for clinicians to reduce administrative burden.

aiOla is an enterprise voice AI platform whose Jargonic ASR model turns noisy, jargon-heavy speech into structured data for frontline teams.

Open-source voice cloning and TTS — competitive with ElevenLabs at a fraction of the cost.

Ellipsis Health is a voice AI care-management platform that calls complex patients for triage, coordination, and enrollment using vocal-biomarker technolog

Soniox is a speech AI platform offering real-time multilingual speech-to-text, translation, and text-to-speech across 60+ languages through one API.

Voice AI that automates healthcare revenue cycle calls — eligibility, prior auth, claim status, credentialing. $15M Series A in early 2026.

Aktify is the autonomous AI SMS sales rep that texts inbound and outbound leads at scale. Handles two-way conversations, qualifies leads, and books meeting
It depends on the task. For text-to-speech and voice cloning, ElevenLabs and Cartesia lead; for AI phone calls, Bland and Vapi build agents that talk to customers; for transcription, Deepgram converts speech to text in real time. Choose by whether you need to generate a voice, run a conversation, or transcribe audio.
ElevenLabs is the most widely used for natural, expressive synthetic voices and cloning, with Cartesia and LMNT competing on low latency for real-time apps. ElevenLabs covers many languages and a large voice library, while Cartesia targets fast, streaming generation. Compare on the voices and latency your use case needs.
An AI voice agent answers and places phone calls autonomously, understanding speech, responding in a natural voice, and taking actions like booking or routing. Bland, Vapi, and PolyAI power use cases from receptionists to support lines. The agent combines speech-to-text, a language model, and text-to-speech into one real-time loop.
Most teams use a platform that bundles the pieces rather than wiring them by hand. Vapi and Synthflow handle telephony, turn-taking, and the speech models so you focus on the conversation logic, while ElevenLabs or Cartesia supply the voice and Deepgram the transcription. You define the script, tools, and handoff rules.
Yes. Tools like ElevenLabs create a synthetic copy of a voice from a short sample, used for narration, localization, and accessibility. Because cloning can be misused, reputable tools require consent and add safeguards, and several regions now regulate synthetic voice. Use cloning only with permission from the voice owner.
Speech-to-text transcribes spoken audio into written words, which powers captions, transcription, and the listening side of voice agents, as Deepgram does. Text-to-speech does the reverse, turning written text into spoken audio, as ElevenLabs does. A full voice agent uses both, plus a language model in between.
Receive weekly updates so you can stay up-to-date with the world of AI
Receive weekly updates so you can stay up-to-date with the world of AI