Hume AI is a New York-based AI lab building empathic voice models — speech systems designed to recognize and respond to emotional expression in addition to words. The company's flagship technology centers on two model families: EVI (Empathic Voice Interface), a speech-to-speech conversational model, and Octave, a text-to-speech model with adjustable emotional and stylistic controls. Together, they aim to make voice AI feel more natural by attending to prosody, tone, and affect.

The company was founded in 2021 by CEO Alan Cowen, a former Google DeepMind researcher who specialized in the science of emotion. Hume's research foundation is unusual for an AI startup: the team has published peer-reviewed work on emotional taxonomies and uses large datasets of human vocal expression to train its models. That research positioning differentiates Hume from generic TTS and conversation API vendors.

Hume announced a $50M Series B in March 2024, led by EQT Ventures with participation from Union Square Ventures, Nat Friedman and Daniel Gross, Metaplanet, Northwell Holdings, Comcast Ventures, and LG Technology Ventures. Reported total funding sits around $74M. The company has cited customers including Anthropic, Replit, and Lawyer.com, which use its voice technology in production agents and assistants.

On the product side, EVI has progressed through multiple generations — most recently EVI 3 and EVI 4 mini — adding expressiveness, customizability, and lower latency. Octave 2 is the second generation of Hume's frontier text-to-speech, supporting 11 languages, generating audio in under 200 milliseconds, and roughly half the price of Octave 1. Both lines are accessible via API and are positioned for voice agents, accessibility, education, mental wellness, and creative use cases.

Hume competes with ElevenLabs, OpenAI's voice models, PlayHT, and emerging speech-to-speech players. Its distinct angle is empathy and emotional intelligence: rather than only optimizing for naturalness, Hume optimizes for systems that respond appropriately to how a user actually feels. That positioning is particularly relevant in agent, therapy-adjacent, and customer-facing voice applications.