What does Galileo do?

Galileo provides an evaluation and observability platform that measures the quality, safety, and reliability of LLM and agentic AI applications across development and production, using proprietary auto-evaluation models and real-time guardrails.

How is Galileo different from generic LLM observability tools?

Beyond tracing requests, Galileo invests in proprietary research-grade evaluation models (such as its Luna family) that score outputs for factuality and context adherence without needing labeled ground truth, and it adds active guardrails to block bad outputs.

Does Galileo support AI agent evaluation?

Yes. Galileo traces each step of an agent's plan, tool calls, and retrievals, attributes failures to specific spans, and provides metrics tailored to multi-step autonomous workflows.

Can Galileo run guardrails in production?

Yes. Galileo Protect applies real-time guardrails that intercept harmful, hallucinated, or off-policy responses before they reach end users.

Startups AI Evaluation Galileo

Galileo

Active

Evaluation and observability platform that helps enterprise AI teams measure, debug, and

📅 Founded 2021 👥 11-50 🏷 AI Evaluation

Visit website

Total raised

$63M

2 rounds

Stage

Series B

Team

11-50

since 2021

Pricing

Freemium

free plan

Founded

2021

Agent-ready

—

About Galileo

Galileo is a San Francisco-based company building an evaluation intelligence platform for teams developing large language model and agentic AI applications. The core problem Galileo addresses is that traditional software metrics break down for non-deterministic LLM systems: the same prompt can produce different outputs, and failures like hallucination, prompt injection, or broken tool calls are difficult to catch with conventional testing. Galileo replaces ad-hoc spreadsheets and manual spot checks with a structured, quantitative approach to measuring AI quality.

The platform is anchored by Galileo's proprietary evaluation research, including small specialized models such as its Luna evaluation family that score outputs for factuality, context adherence, toxicity, and other dimensions without requiring ground-truth labels for every example. Teams define test sets and custom metrics, run experiments to compare prompts and models, and then carry the same evaluators into production as continuous monitoring. Galileo Protect adds real-time guardrails that can intercept harmful or off-policy responses before they reach users.

Galileo has expanded heavily into agent evaluation, where the surface area for failure is much larger: agents plan, call tools, retrieve context, and chain multiple model calls. The platform traces each step, surfaces where an agent went wrong, and helps teams attribute failures to specific spans. This is increasingly important as enterprises move from single-prompt chatbots to multi-step autonomous workflows.

The company raised a $45M Series B in October 2024 led by Scale Venture Partners with participation from Premji Invest, Databricks Ventures, ServiceNow Ventures, Citi Ventures, and other strategic investors, bringing total funding to roughly $68M. Galileo reports rapid enterprise adoption, working with large customers across financial services, telecommunications, and technology who need auditable confidence in their AI before deploying it to customers.

Key capabilities

Auto-evaluation metrics for hallucination, factuality, and context adherence without ground-truth labels

Proprietary Luna evaluation models for low-cost, high-throughput scoring

Agent evaluation with step-by-step trace analysis and failure attribution

Real-time guardrails (Galileo Protect) to intercept harmful or off-policy outputs

Experiment tracking to compare prompts, models, and RAG configurations

Production observability with continuous quality monitoring and alerting

Custom metric and LLM-as-a-judge configuration for domain-specific criteria

Human-in-the-loop labeling workflows for building evaluation datasets

Agent readiness

10/100

Early

MCP server

Public API

Webhooks

OAuth 2.0

SDKs

No public agent surfaces detected yet.

Funding history

2 · $63M

— Series B $45M incl. Battery Ventures +5

— Series A $18M incl. Battery Ventures +2

Capital network

$68M raised ·8 backers·10 network links

Backers8
Battery Ventures2 rounds Scale Venture Partners1 round Databricks Ventures1 round Citi Ventures1 round Premji Invest1 round ServiceNow Ventures1 round+2 more backers
Shared portfoliocompanies these backers also fund
OpenRouter2 Sigma Computing2 Orkes1 Level AI1 Tahoe Therapeutics1
Extended networkfunds that co-invest alongside them
Spark Capital3 Avenir Growth Capital3 Andreessen Horowitz2 Menlo Ventures2 Eniac Ventures2

Key operators

Atindriyo Sanyal

Co-founder & CTO

Vikram Chatterji

Co-founder & CEO

Yash Sheth

Co-founder & COO

Alternatives

6 All →

LMArena

Crowdsourced leaderboard for ranking AI models

Foundation ModelsAI Evaluation

Giskard

Secure AI agents with continuous AI red teaming

AI EvaluationAI Safety

Traceloop

LLM observability and reliability built on the open-source OpenLLMetry standard

AI EvaluationAI Observability

micro1

Human intelligence infrastructure for high-quality AI training data

AI EvaluationData Labeling

Scale AI

Data labeling and AI infrastructure platform powering frontier models for enterprises and governments.

AI InfrastructureAI Evaluation

Confident AI

DeepEval-powered LLM evaluation and observability

AI EvaluationAI Observability

Frequently asked

What does Galileo do?: Galileo provides an evaluation and observability platform that measures the quality, safety, and reliability of LLM and agentic AI applications across development and production, using proprietary auto-evaluation models and real-time guardrails.
How is Galileo different from generic LLM observability tools?: Beyond tracing requests, Galileo invests in proprietary research-grade evaluation models (such as its Luna family) that score outputs for factuality and context adherence without needing labeled ground truth, and it adds active guardrails to block bad outputs.
Does Galileo support AI agent evaluation?: Yes. Galileo traces each step of an agent's plan, tool calls, and retrievals, attributes failures to specific spans, and provides metrics tailored to multi-step autonomous workflows.
Who funds Galileo?: Galileo raised a $45M Series B in October 2024 led by Scale Venture Partners, with participation from Premji Invest, Databricks Ventures, ServiceNow Ventures, and Citi Ventures, bringing total funding to about $68M.
Can Galileo run guardrails in production?: Yes. Galileo Protect applies real-time guardrails that intercept harmful, hallucinated, or off-policy responses before they reach end users.

Discussion

Watching

Get Galileo updates

New funding, product launches, and team changes — to your inbox.

Follow startup

Claim ownership

Verify with your work email to manage this listing.

Explore more around Galileo

Contextual paths to related AI startups, deals and rankings.

Similar to Galileo

Compare

Alternatives

All alternatives to Galileo

Galileo

Claim Galileo

Enter your code

Claim approved

Claim received

Claim Galileo

Enter your code

Claim approved

Claim received

About Galileo

Key capabilities

Agent readiness

Funding history

Capital network

Key operators

Atindriyo Sanyal

Vikram Chatterji

Yash Sheth

Alternatives

LMArena

Giskard

Traceloop

micro1

Scale AI

Confident AI

Frequently asked

Explore more around Galileo

Similar to Galileo

Categories

Compare

Alternatives

Rankings