Cartesia is a voice AI company building ultra-low-latency speech models for real-time applications. Its core technology is based on state space models (SSMs), an architecture pioneered by its founding team that is designed to be more efficient than transformer-based alternatives, enabling fast, natural conversational speech and deployment scenarios that include on-device and edge environments.
Founded in 2023 by Karan Goel, Albert Gu, Brandon Yang, Arjun Desai, and Chris Ré, the team met as researchers at the Stanford AI Lab, where they helped develop state space models. Cartesia's flagship Sonic text-to-speech models emphasize very low time-to-first-audio and human-like delivery, including emotional nuance, supporting the kind of responsiveness required for live voice agents and interactive applications.
The company positions its models for production voice applications where latency directly affects user experience, such as voice agents, customer interactions, and real-time assistants. Successive Sonic releases have expanded language support and improved naturalness, with the company reporting end-to-end latency low enough for fluid back-and-forth conversation across many languages.
Cartesia has attracted significant venture investment. It raised a Series A led by Kleiner Perkins, followed by a later $100 million round with participation from Kleiner Perkins, Index Ventures, Lightspeed, and NVIDIA, bringing total funding to roughly $164 million. The backing from infrastructure and venture investors reflects interest in efficient real-time voice models as conversational AI adoption grows.
As voice AI is a competitive and rapidly advancing area, organizations evaluating Cartesia should benchmark latency, voice quality, and language coverage against their use case, and confirm deployment options and pricing fit their real-time application requirements.