Galileo is a San Francisco-based company building an evaluation intelligence platform for teams developing large language model and agentic AI applications. The core problem Galileo addresses is that traditional software metrics break down for non-deterministic LLM systems: the same prompt can produce different outputs, and failures like hallucination, prompt injection, or broken tool calls are difficult to catch with conventional testing. Galileo replaces ad-hoc spreadsheets and manual spot checks with a structured, quantitative approach to measuring AI quality.

The platform is anchored by Galileo's proprietary evaluation research, including small specialized models such as its Luna evaluation family that score outputs for factuality, context adherence, toxicity, and other dimensions without requiring ground-truth labels for every example. Teams define test sets and custom metrics, run experiments to compare prompts and models, and then carry the same evaluators into production as continuous monitoring. Galileo Protect adds real-time guardrails that can intercept harmful or off-policy responses before they reach users.

Galileo has expanded heavily into agent evaluation, where the surface area for failure is much larger: agents plan, call tools, retrieve context, and chain multiple model calls. The platform traces each step, surfaces where an agent went wrong, and helps teams attribute failures to specific spans. This is increasingly important as enterprises move from single-prompt chatbots to multi-step autonomous workflows.

The company raised a $45M Series B in October 2024 led by Scale Venture Partners with participation from Premji Invest, Databricks Ventures, ServiceNow Ventures, Citi Ventures, and other strategic investors, bringing total funding to roughly $68M. Galileo reports rapid enterprise adoption, working with large customers across financial services, telecommunications, and technology who need auditable confidence in their AI before deploying it to customers.