AssemblyAI is a San Francisco-based AI research company that builds state-of-the-art speech models accessible via API. Its core product is a transcription and audio intelligence API that handles speech-to-text, speaker diarisation, sentiment analysis, topic detection, content moderation, summarisation, and PII redaction across both pre-recorded and streaming audio. The platform powers production features inside companies like Zoom and Spotify and is used by more than 20,000 developers.
The company was founded in 2017 by CEO Dylan Fox, previously a senior engineer at Cisco, and went through Y Combinator as a solo-founder bet on speech AI. AssemblyAI has raised approximately $78M to date across seed, Series A, and Series B rounds, with backing from Accel, Insight Partners, Y Combinator, and others. Daniel Gross was an early investor following the Y Combinator interview.
AssemblyAI's flagship model is Universal-2, the successor to Universal-1, which is trained for high accuracy on noisy real-world audio with strong handling of accents, rare names, brands, and locations. The company has continued to expand language coverage and now supports 99 languages at a flat rate. The product line is API-first: developers send audio and receive structured JSON, with optional add-ons for additional intelligence layers.
Pricing is published transparently, with pay-as-you-go transcription starting at $0.15 per hour for Universal-2, plus per-feature add-ons for speaker diarisation, sentiment, PII redaction, and summarisation. There is also a flat rate for multilingual transcription, which is attractive for global use cases. Enterprise commitments and volume discounts are available.
AssemblyAI competes with Deepgram, Google Speech-to-Text, OpenAI Whisper-based offerings, and Microsoft Azure Speech. Its bet is that purpose-built speech research, combined with a developer-friendly API and audio intelligence beyond raw transcription, gives it a durable advantage with product teams building anything from meeting summarisation to call analytics to content moderation.