On February 4, 2026 ElevenLabs closed $500M at an $11B valuation — Sequoia leading, a16z and Iconiq alongside — and six weeks later it shipped 11.ai, an MCP-native voice assistant that runs daily workflows by voice. The category that NeuronFeed indexes (37 startups, $1.9B disclosed) was already moving fast; that single round redrew the cap table for everyone else.
Cartesia answered with Sonic-2, a state-space model tuned for streaming inference that pushes end-to-end latency under 90ms — small numbers that matter a lot when you're building a phone agent. Deepgram now ships a full speech-to-speech stack on top of its $109M Series C ASR business. AssemblyAI and Hume AI ($73.9M Series B, paralinguistic emotion) are the next tier. Hippocratic AI ($335M Series B) deploys safety-trained voice agents into US healthcare networks.
The Whisper effect, two years later
OpenAI's open-sourcing of Whisper in 2022 is still doing damage to ASR pricing. Margins on raw transcription collapsed; the survivors moved up-stack into agents, dubbing, and clinical scribes. Suki AI ($70M Series C) is a clinical-scribe pure-play. Murf AI (Bangalore, $13M seed) keeps a 20-language TTS franchise without ever raising at the ElevenLabs scale, and DeepL's voice extension entered translation-as-meeting last year.
The lawsuit overhang
Music-AI is the cautionary tale next door: Suno and Udio are still defending RIAA lawsuits filed in 2024, and the discovery has dragged into 2026. Voice-cloning vendors took the lesson and built consent flows early. ElevenLabs requires verified voice ownership; Resemble AI ships deepfake detection in the same SDK as its synthesis API. The EU AI Act's labelling requirement for synthetic voice landed in 2025; Tennessee's ELVIS Act and California's AB 2602 followed. Compliance tooling is now a feature, not an afterthought.
What's next
OpenAI's Realtime API and Google's Gemini Live both compress TTS, ASR, and dialogue into one network. The defensible bet for standalone vendors is latency at the edge (Cartesia), enterprise integration (Hippocratic, PolyAI), or vertical workflow ownership (Suki for clinical, Speak for language learning). Bland AI and Retell AI are racing for the SMB outbound-dial wedge.