Skip to main content
NeuronFeed
CATEGORY

AI Voice & Speech startups (2026)

ElevenLabs at $11B, Cartesia's Sonic-2 streaming, and the voice-agent stack racing to replace IVR.

124 ai voice & speech startups tracked, with the largest concentration in US. Total tracked funding: $4.8B.

Tracked
124
Total Raised
$4.8B
Countries
19
Active Deals
4

Editor's picks

6

Top by score

View all 124 →

Funding by year — AI Voice & Speech

2019 → 2026
$12M
’19
$152M
’22
$164.1M
’23
$224M
’24
$1.7B
’25
$242.7M
’26

Market overview

On February 4, 2026 ElevenLabs closed $500M at an $11B valuation — Sequoia leading, a16z and Iconiq alongside — and six weeks later it shipped 11.ai, an MCP-native voice assistant that runs daily workflows by voice. The category that NeuronFeed indexes (37 startups, $1.9B disclosed) was already moving fast; that single round redrew the cap table for everyone else.

Cartesia answered with Sonic-2, a state-space model tuned for streaming inference that pushes end-to-end latency under 90ms — small numbers that matter a lot when you're building a phone agent. Deepgram now ships a full speech-to-speech stack on top of its $109M Series C ASR business. AssemblyAI and Hume AI ($73.9M Series B, paralinguistic emotion) are the next tier. Hippocratic AI ($335M Series B) deploys safety-trained voice agents into US healthcare networks.

The Whisper effect, two years later

OpenAI's open-sourcing of Whisper in 2022 is still doing damage to ASR pricing. Margins on raw transcription collapsed; the survivors moved up-stack into agents, dubbing, and clinical scribes. Suki AI ($70M Series C) is a clinical-scribe pure-play. Murf AI (Bangalore, $13M seed) keeps a 20-language TTS franchise without ever raising at the ElevenLabs scale, and DeepL's voice extension entered translation-as-meeting last year.

The lawsuit overhang

Music-AI is the cautionary tale next door: Suno and Udio are still defending RIAA lawsuits filed in 2024, and the discovery has dragged into 2026. Voice-cloning vendors took the lesson and built consent flows early. ElevenLabs requires verified voice ownership; Resemble AI ships deepfake detection in the same SDK as its synthesis API. The EU AI Act's labelling requirement for synthetic voice landed in 2025; Tennessee's ELVIS Act and California's AB 2602 followed. Compliance tooling is now a feature, not an afterthought.

What's next

OpenAI's Realtime API and Google's Gemini Live both compress TTS, ASR, and dialogue into one network. The defensible bet for standalone vendors is latency at the edge (Cartesia), enterprise integration (Hippocratic, PolyAI), or vertical workflow ownership (Suki for clinical, Speak for language learning). Bland AI and Retell AI are racing for the SMB outbound-dial wedge.

Key trends 2026

  • Sub-100ms latency is the new bar. Cartesia's Sonic-2 and Deepgram's speech-to-speech stack cleared sub-90ms end-to-end, making natural phone agents feel like calls instead of demos.
  • MCP-native voice arrives. 11.ai (March 2026) is the first major voice assistant built on Model Context Protocol — voice as a control surface for the entire AI stack, not just dictation.
  • Music-AI lawsuits chill voice cloning. Suno and Udio's RIAA litigation pushed the whole synthesis stack to ship consent flows, watermarking, and provenance tooling ahead of regulation.
  • Whisper killed ASR-as-a-service margins. AssemblyAI and Deepgram both moved up-stack into agents and full pipelines because raw transcription is now a commodity priced near zero.

Benchmarks vs global

ElevenLabs valuation (Feb 2026)
$11B
Series D, $500M raise
Cartesia Sonic-2 latency
<90ms
streaming TTS, prod-ready
Total funding tracked
$1.9B
ElevenLabs is ~49% alone
Tracked startups
37
13 US, 3 UK, 2 IN, rest spread

Top countries

By startup count

Stage breakdown

Latest round type
  • Seed 50
  • Series A 28
  • Series B 10
  • Pre-Seed 7
  • Series C 6
  • Pre-Series A 3
  • Growth 2
  • Series D 1

Top investors backing AI Voice & Speech

See all →

FAQ

Frequently asked

What changed with the ElevenLabs Series D?
On February 4, 2026 ElevenLabs raised $500M at an $11B valuation, led by Sequoia with a16z and Iconiq Capital. Total raised is now $922M. The round funded 11.ai (an MCP-native voice assistant launched March 2026), the v3 expressive TTS model, and a beta image-and-video stack that bundles voice with multimedia generation. The valuation reset every other voice-AI cap table.
Why does Cartesia matter on a $65M Series A?
Cartesia ships Sonic-2, a state-space architecture tuned for streaming rather than transformer-based TTS. Latency lands under 90ms end-to-end, with lower cost-per-second than the ElevenLabs API. For voice agents holding live phone conversations, those numbers are decisive — investors are betting runtime economics outrank absolute audio fidelity at scale.
How are frontier voice modes pressuring the stack?
OpenAI's Realtime API and Google's Gemini Live compress TTS, ASR, and dialogue into one network. They handle most casual use for free or near-free inside chat apps. Standalone vendors compete on latency-at-edge (Cartesia), regulated vertical workflow (Suki for clinical scribing, Hippocratic for healthcare agents), or developer ergonomics (Deepgram, AssemblyAI).
Is voice cloning legal in 2026?
It depends on jurisdiction and consent. The EU AI Act requires labelling of synthetic voice. Tennessee's ELVIS Act and California's AB 2602 restrict commercial cloning without artist consent. Suno and Udio's ongoing RIAA lawsuits made the legal exposure concrete, so ElevenLabs, Resemble, and Murf all require verified ownership and embed watermarks by default.
How big is the voice-agent opportunity vs creator TTS?
Larger by an order of magnitude. Customer support, outbound sales, healthcare follow-up, and IVR replacement together represent tens of billions in legacy spend. A voice agent that closes 5-minute calls with a 3% handoff rate replaces human work at a fraction of the cost. Hippocratic AI ($335M Series B) and PolyAI are the credible enterprise plays; Bland AI and Retell AI race the SMB tier.

Recent rounds in AI Voice & Speech

All rounds →
Date Startup Round Amount
May 2026 Vapi Series B $50M
Mar 2026 ActionPower Series B $4.1M
Feb 2026 ElevenLabs Series D Undisclosed
Feb 2026 Newo.ai Series A $25M
Jan 2026 Bolna AI Seed $6.3M
Jan 2026 Bolna Seed $6.3M
Jan 2026 Deepgram Series C $130M
Jan 2026 Tucuvi Series A $20M

All AI Voice & Speech startups

Page 2

Ringg AI

est. 2023
Raised
$5.5M
Stage
S-A
66

Tucuvi

ES est. 2019

Clinically validated AI voice agent for autonomous patient follow-up

Raised
$20M
Stage
S-A
65

HappyRobot

US est. 2023

AI voice agents that automate logistics and freight communications

Raised
$15.6M
Stage
S-A
65

Hippocratic AI

Verified
US est. 2023

Safe AI agents for healthcare

Raised
$461M
Stage
S-C
64

Dubformer

NL est. 2023

AI dubbing studio for accurate, expressive video localization in 140+ languages

Raised
$3.6M
Stage
Seed
64

GetVocal

FR est. 2023

Governed conversational AI phone agents for regulated enterprises

Raised
$30M
Stage
S-A
64

Beside

FR est. 2024

An AI receptionist for the real economy that never misses a call

Raised
$32M
Stage
S-A
64

ai-coustics

DE est. 2021

Real-time speech enhancement that makes voice AI work at scale

Raised
$5.4M
Stage
Seed
64

AudioStack

GB est. 2019

AI audio production platform for enterprise-scale voice content

Raised
$10.6M
Stage
PRE-SERIES A
64

Hamming AI

est. 2024
Raised
$3.8M
Stage
Seed
64

Rime

est. 2022
Raised
$5.5M
Stage
Seed
64

AethexAI

Nigeria est. 2025
Raised
$3M
Stage
Pre-S
64

pyannoteAI

est. 2024
Raised
$9M
Stage
Seed
64

Kotoba Technologies

JP est. 2023

Real-time AI simultaneous speech translation and interpretation

Raised
$13.3M
Stage
Seed
63

SuperDial

US est. 2023

AI voice agents that automate healthcare's endless administrative phone calls

Raised
$20M
Stage
S-A
63

Leaping AI

US est. 2023

Human-like AI voice agents for companies and call centers

Raised
$4.7M
Stage
Seed
63

Iconic

GB est. 2023

On-device, voice-driven AI for living, AI-native game worlds

Raised
$13M
Stage
Seed
63

Palabra AI

GB est. 2023

Real-time AI voice translation that replaces interpreters

Raised
$8.4M
Stage
Pre-S
63

Buddy.ai

US est. 2017

AI English tutor for kids using voice-based learning games

Raised
$19M
Stage
Seed
63

Thoughtly

est. 2023
Raised
$8.5M
Stage
Seed
63

Vox AI

NL est. 2023

Autonomous voice AI for restaurant drive-thrus

Raised
$10M
Stage
Seed
62

Phonely

US est. 2023

AI receptionist that answers FAQs, routes calls and books appointments

Raised
$19M
Stage
S-A
62

Ello

US est. 2019

AI reading coach that listens to kids read aloud and helps them improve

Raised
$15.1M
Stage
S-A
62

Flai

US est. 2024

Omni-channel AI assistants that replace dealership phone trees

Raised
$6M
Stage
Seed
62