Coval is an evaluation and simulation platform purpose-built for production AI voice and chat agents. The product lets engineering teams define agent specifications, generate thousands of simulated conversation scenarios, run them against deployed agents, and grade the results with both built-in and custom metrics. Specific capabilities include tool-call validation, latency and interruption tracking, voice realism testing, load and permutation testing, and CI/CD-style regression gates that block bad agent deploys.
The company was founded in 2024 by Brooke Hopkins, formerly the engineering lead for evaluation infrastructure at Waymo, where she shipped the simulation systems used to test autonomous vehicles before they touched public roads. Coval was conceived during her time in Y Combinator's Summer 2024 batch and launched publicly in October 2024. The team is headquartered in San Francisco and remains lean, around 10 employees.
Coval announced its $3.3M seed round in January 2025, led by MaC Venture Capital with participation from Y Combinator, General Catalyst, Fortitude Ventures, Pioneer Fund, Lombard Street Ventures, and a roster of angels. Customer growth has been driven primarily by voice-agent companies and YC alumni; published Latka data put 2025 revenue near $550K with a five-person team at the time, suggesting strong per-headcount efficiency.
The product is delivered as a managed SaaS with an SDK and CLI. Engineering teams typically wire Coval into their existing voice stacks (LiveKit, Vapi, Retell, Bland, custom Twilio pipelines) and run scenarios on every release. Pricing is published as a request-access model with usage-based components on top of seat or platform fees; specific list pricing is not yet public.
The key competitive frame is that general LLM eval tools (Braintrust, Langfuse, Patronus, Confident AI) optimize for text agents and offline benchmarks, while Coval is laser-focused on the messier reality of voice: latency budgets, barge-in handling, tool calls, mid-conversation drift, and accent or noise robustness. The Waymo lineage shows up clearly: Coval treats every voice agent like an autonomous vehicle that must drive thousands of synthetic miles before each release.
Coval's differentiator is depth in one vertical. Rather than trying to be the eval platform for every LLM use case, it doubles down on voice and chat agents, the segment where stakes are highest because failures occur in real time during customer conversations.