AI audio tools — text-to-speech, voice cloning, sound effects, and editing. Generate and shape speech, voices, and soundscapes from text or samples.

Explore advanced text-to-speech and voice cloning software for lifelike voiceovers and content generation.

Voice AI for meetings — #1 noise cancellation, AI Note Taker, Accent AI, and Call Center AI.

Conversational AI that runs sales, support, and scheduling phone calls at scale.

ChatGPT-powered AI voice recorder — 0.12in thin, 30-hour recording, 60-day standby. Records, transcribes, and summarizes calls and meetings.

Real-time voice AI platform with low-latency speech, cloning, and TTS APIs.

Voice-first AI companion — natural conversational voice and lightweight all-day eyewear.

Build voice, video, and physical AI agents on real-time infrastructure — open-source LiveKit Agents framework + LiveKit Cloud managed deployment. Series C-funded.

World's most lifelike voice AI agents for enterprise — PolyAI deflects up to 80% of transactional calls without escalation. Banks, airlines, hospitality use it.

AI generates sound effects, music, and ambience from your video in seconds. Web studio + API for creators and developers. $41M seed in early 2026.

Enhance voice recordings for a professional podcasting studio feel.

Sonix offers automated transcription, translation, and subtitling for audio and video files in over 40 languages.

End-to-end voice AI platform for enterprise call automation — $20M Series A (Accel), 1000+ customers, 45M calls handled, 99.9% uptime.

AI speech technology for accurate transcription and real-time translation.

Automotive AI assistant platform — in-car voice AI deployed in 500M+ vehicles globally. Public ($CRNC). Spun out from Nuance.

AI phone agents that answer every call, book appointments, and handle customer support 24/7 — trusted by 10,000+ businesses, YC-backed, fast setup.
Fast, lifelike, affordable AI speech — studio-quality voice clones with 150ms latency. 24 languages. The TTS pick for cost-sensitive voice agents.

Open-source Python framework for real-time voice and multimodal conversational agents — by Daily, the WebRTC infrastructure leader. Most-used voice agent OSS.

Stability AI's audio model — generates structured tracks up to 3 minutes at 44.1kHz, transforms samples, and produces sound effects from text prompts.

Generative voice AI plus deepfake detection — voice cloning, TTS, and content security.

Advanced AI Speech-to-Text and Voice Recognition Solutions

Speechify Studio - AI Voice Generator

Unlock the potential of audio data with transcription, translation, and audio intelligence.

Verbit offers professional transcription and captioning services with high accuracy.

Riverside.fm offers studio-quality podcast and video recording online.

Real-time speech-native multimodal LLM — Ultravox understands audio directly without separate ASR, achieving 150ms TTFT. Open weights, by Fixie AI.

LOVO AI is an award-winning AI voice generator and text-to-speech software with over 500 voices in 100 languages, including realistic AI voices and online video editor capabilities.

Muah AI is an AI companion platform (18+) with text chat, AI voice calls, and photo generation.

Voicemod is the real-time AI voice changer used by streamers, gamers, and creators. AI voice cloning, soundboard, text-to-speech. 30M+ users.

7ART is an AI character platform that generates music, video, images, and voice from one consistent character.

AI Phone Agent and Virtual Receptionist for service businesses — 3rd-gen platform with 300ms latency, 100% accuracy. Real estate, home services, contact centers.

Effortless podcast creation for the modern era.

Enterprise-grade speech-to-text and voice AI APIs from Rev — best-in-class English accuracy.

GTM AI agents for chat, email, voice & SMS — automated demos, bi-directional Salesforce sync.

Create customizable royalty-free music with advanced mood-based music generation

Build, deploy, and scale hyperrealistic voice AI agents.

Generative AI filmmaking tools for cinematic quality VFX.

Deepdub offers AI-driven solutions to reimagine global entertainment experiences, providing high-quality localization at scale.

Split vocals & instruments with LALAL.AI's precise AI.

Transform long-form audio into diverse content assets.

Low-code platform for ultra-realistic Voice AI Employees — receptionist agents in 3 minutes from a website URL. AI call center with 100+ concurrent calls.

AI voice agents purpose-built for compliant consumer lending — Taylor handles inbound and outbound across welcome, verification, payments, hardship, and collections.

Wearable personal AI — 7-day battery, dual-mic ambient capture, transforms your conversations into summaries, insights, and reminders. Wrist or clip-on.

AI-driven app that structures voice and text notes.

Camb.ai is a multilingual voice cloning and translation platform supporting 140+ languages. Used by content creators, dubbing studios, and global brands.

Effortless video and audio transcription app.

Online text to speech converter with natural voices.

AI voice and video generator with over 900+ voices in 142 languages.

Enterprise voice AI platform with studio-quality narrated avatars. Used by Coursera, BambooHR, McKinsey for training, marketing, and product.

AI-driven platform to convert text into podcast-style audio.

Revolutionary AI-powered audio transcription solution.

Transcribe audio and video into text with ExemplaryAI.

AI platform for creating music covers using distinct voice models.

Leading Speech AI models for transcribing speech to text and extracting insights from voice data.

Forever Voices: Transforming your personal stories into unique, AI-generated voice experiences for future generations.

Ultra-realistic AI voice generator — fastest TTS API for voice agents, plus Studio and AI Dubbing.
Effortlessly convert text to lifelike speech

A revolutionary platform for voice processing and recognition.

AI-driven platform for content creation, from text-to-speech to video editing.

Boson AI is an audio foundation model company that builds the Higgs Audio models for text-to-speech, speech-to-text and audio understanding.

Generative AI for scalable voice, video, and image content.

aiOla is an enterprise voice AI platform whose Jargonic ASR model turns noisy, jargon-heavy speech into structured data for frontline teams.

Open-source voice cloning and TTS — competitive with ElevenLabs at a fraction of the cost.

Ellipsis Health is a voice AI care-management platform that calls complex patients for triage, coordination, and enrollment using vocal-biomarker technolog

AI-powered tool for enhancing and editing audio recordings.

AI-powered platform for generating infinite, royalty-free drums.

Generate podcast content, including show notes, timestamps, and more with AI.

AI-powered platform for summarizing articles into audio podcasts.

Eddy helps transcribe, edit, and promote podcasts for free. Boost accessibility and optimize your podcast for SEO with Eddy.

Online platform for converting audio to text.

Pioneers in audio technology and vocal processing software.

Transform podcasts into engaging written assets swiftly and accurately.

Enhance your digital presence with advanced voice AI technology.

Seamless podcast recording & editing with AI.

Podcast hosting platform for creation, promotion, and monetization.

Premium podcast clips on-demand for enhanced brand socials.
Receive weekly updates so you can stay up-to-date with the world of AI
Receive weekly updates so you can stay up-to-date with the world of AI