Lemma is a reliability and observability platform built specifically for AI agents running in production. It addresses the problem that agent performance can silently degrade over time as user behavior drifts and new edge cases appear.

How does Lemma fix agent regressions?

Lemma detects drift and regressions in production and ships fixes as pull requests, including auto-optimized prompts. By delivering changes as PRs, it lets engineering teams review and merge improvements through their normal workflow.

Why do AI agents degrade in production?

Agent quality can drop as real-world user behavior drifts away from what the agent was tuned for and as new edge cases surface. Lemma cites teams seeing roughly a 40 percent quality drop within weeks of launch, which it aims to catch and correct.

Lemma was founded in 2025 by Jerry Zhang and Cole Gawin, who left college to join the Y Combinator F25 batch. The company focuses on keeping production AI agents reliable over time.

Startups AI Agents Lemma

Lemma

Active

Reliability platform that catches AI agent regressions and auto-optimizes prompts in production.

📍 San Francisco, United States 📅 Founded 2025 🏷 AI Agents

Visit website

Total raised

$500K

Stage

Seed

Sep 2025

Team

—

Pricing

Contact-sales

free trial

Founded

2025

San Francisco, United States

Agent-ready

—

About Lemma

Lemma is a reliability and observability platform built specifically for AI agents running in production. It addresses the well documented problem that agent performance silently degrades over time as user behavior drifts and new edge cases appear, with some teams seeing roughly a 40 percent quality drop within weeks of launch. Lemma was founded in 2025 by Jerry Zhang and Cole Gawin, who left college to join the Y Combinator F25 batch after watching every AI company they spoke with rebuild the same internal evaluation and prompt-tuning infrastructure from scratch.

The product closes the loop between deployment and improvement. Lemma combines automated online evaluations with real user signals such as task adherence, frustration, and recovery rate, then detects failed outcomes directly from live traffic. When a regression appears, Lemma automatically traces the root cause to a specific span or model output, proposes a fix as a prompt change, guardrail, or config update, and can ship that fix as a one-click pull request. A cluster discovery feature continuously embeds traces to surface emerging failure patterns without any manual labeling.

Lemma integrates natively with Vercel AI SDK, OpenAI Agents, Langfuse, Arize Phoenix, Azure Monitor, and LangGraph, with Claude SDK support on the roadmap. Customer reports cite roughly 90 percent less manual prompt iteration, production drift resolution in minutes rather than days, and two to five percent model performance gains per optimization cycle. Pricing is contact-sales today, which fits the platform positioning toward engineering teams running mission-critical agent deployments rather than hobby projects.

Key capabilities

Continuous online evaluations combining synthetic and real user signals

Automated root cause analysis down to span and model output

One-click prompt, guardrail, and config fix pull requests

Natural language semantic search over traces and runs

Cluster discovery for unlabeled emerging failure patterns

Integrations with Vercel AI SDK, OpenAI Agents, Langfuse, Phoenix, and LangGraph

Technology stack

4detected May 30, 2026

Est. monthly stack spend ~$200/mo

Analytics

Google Analytics

CDN

Cloudflare

Framework

webpack

Infra

Vercel

Agent readiness

0/100

Early

MCP server

Public API

Webhooks

OAuth 2.0

SDKs

No public agent surfaces detected yet.

Alternatives

6 All →

Cursor

The AI code editor built for productive engineers.

AI CodingAI Agents

Uniphore

Enterprise business AI for conversations, agents, and data

AI AgentsAI Customer Support

Nebius

Full-stack AI cloud with large-scale GPU clusters for training and inference

Foundation ModelsAI Infrastructure

Celestial AI

Photonic Fabric optical interconnect for AI infrastructure

AI Infrastructure

d-Matrix

Digital in-memory compute (DIMC) chiplet-based hardware purpose-built for AI inference in the

AI Infrastructure

Chainguard

Secure, minimal container images for software and AI supply chains

AI InfrastructureAI for Cyber Defense

Frequently asked

What is Lemma?: Lemma is a reliability and observability platform built specifically for AI agents running in production. It addresses the problem that agent performance can silently degrade over time as user behavior drifts and new edge cases appear.
How does Lemma fix agent regressions?: Lemma detects drift and regressions in production and ships fixes as pull requests, including auto-optimized prompts. By delivering changes as PRs, it lets engineering teams review and merge improvements through their normal workflow.
Why do AI agents degrade in production?: Agent quality can drop as real-world user behavior drifts away from what the agent was tuned for and as new edge cases surface. Lemma cites teams seeing roughly a 40 percent quality drop within weeks of launch, which it aims to catch and correct.
What problem was Lemma created to solve?: Its founders noticed that AI companies were each rebuilding the same internal evaluation and prompt-tuning infrastructure from scratch. Lemma packages that reliability tooling so teams do not have to reinvent it.
Who founded Lemma?: Lemma was founded in 2025 by Jerry Zhang and Cole Gawin, who left college to join the Y Combinator F25 batch. The company focuses on keeping production AI agents reliable over time.

Discussion

Watching

Get Lemma updates

New funding, product launches, and team changes — to your inbox.

Follow startup

Claim ownership

Verify with your work email to manage this listing.

Explore more around Lemma

Contextual paths to related AI startups, deals and rankings.

Similar to Lemma

Country

United States AI startups

Compare

Alternatives

All alternatives to Lemma

Lemma

Claim Lemma

Enter your code

Claim approved

Claim received

Claim Lemma

Enter your code

Claim approved

Claim received

About Lemma

Key capabilities

Technology stack

Agent readiness

Alternatives

Cursor

Uniphore

Nebius

Celestial AI

d-Matrix

Chainguard

Frequently asked

Explore more around Lemma

Similar to Lemma

Categories

Country

Compare

Alternatives

Rankings