Braintrust is an AI observability and evaluation platform designed for teams building production AI features. The product helps engineers and product teams systematically measure model and prompt quality, monitor production traces, and iterate quickly on what is shipped. It combines an evals framework, prompt and dataset management, online and offline experiment tracking, and trace-level observability into a single workflow.

The company was founded by Ankur Goyal, who previously built and sold Impira to Figma and led machine learning engineering work there. Goyal has said publicly that Braintrust came directly out of his own pain — at both Impira and Figma, he and his teams had to build internal evals tooling from scratch every time they shipped a new AI feature.

Braintrust has raised $80M to date. The most recent round, an $80M Series B announced in February 2026, was led by ICONIQ with participation from Andreessen Horowitz, Greylock, Elad Gil, and basecase capital. That round valued the company at roughly $800M. The customer list is notable: Notion, Stripe, Vercel, Airtable, Instacart, Zapier, Ramp, Dropbox, Cloudflare, and BILL all use the platform, positioning Braintrust as a default observability layer for many leading AI-native and AI-adopting companies.

The core workflow centers on evals. Teams define datasets and scoring functions — code-based, LLM-as-judge, or human review — and run them against prompt and model variants in CI or ad hoc. Results are versioned, allowing teams to see whether a change improved or regressed quality before shipping. Production traces feed back into datasets so issues caught in prod can drive new tests.

Braintrust competes with LangSmith, Arize, Weights & Biases, and a growing field of LLM observability tools. Its differentiation is a focus on evals as a first-class workflow — not just dashboards — and on developer experience for engineering teams that already think in terms of unit tests, CI, and code review.