Wafer builds autonomous AI agents that act as GPU performance engineers, profiling, diagnosing, and optimizing GPU inference across kernels, models, and production pipelines, and it also provides inference infrastructure.

What infrastructure does it offer?

It offers serverless and dedicated inference infrastructure for open-source LLMs, with optimizations built in to improve speed.

How does it make inference faster?

Its agents optimize the entire stack, including kernel optimization and serving-stack rewriting, to achieve significant speed improvements.

Does it work with custom silicon?

Yes. The company is working with major chip and cloud players to optimize code for custom silicon.

Startups AI Design Wafer

Wafer

Active

AI that makes AI fast

📍 San Francisco, United States 📅 Founded 2025 👥 1-10 🏷 AI Design

Visit website

Total raised

$4.5M

2 rounds

Stage

Seed

Jan 2026

Team

1-10

since 2025

Pricing

—

Founded

2025

San Francisco, United States

Agent-ready

—

About Wafer

What Wafer does

Wafer is an inference platform for open-source large language models, providing fast serverless and dedicated AI inference as an alternative to proprietary AI services. It is built to run open models efficiently for production workloads.

Key capabilities

Wafer Serverless offers pay-as-you-go API access to a range of open models, and the company publishes throughput benchmarks positioning it against providers like Together.ai. Wafer Dedicated provides custom infrastructure for mission-critical workloads. The platform is OpenAI-API compatible, so it can serve as a drop-in replacement with existing SDKs and frameworks, and offers prompt-cache pricing that significantly reduces the cost of repeated prompt prefixes. It includes compliance features such as zero data retention options, data processing agreements and SLA-backed uptime, with model- and hardware-specific optimization across AMD and NVIDIA GPUs.

Who it's for

Wafer targets developers and enterprises running open-source LLMs in production who need low latency, high throughput, cost efficiency and compliance, including for use cases like voice agents and batch processing.

Key capabilities

Autonomous AI agents acting as GPU performance engineers

Profiling, diagnosing, and optimizing GPU inference

Optimization across the stack from kernels to models to pipelines

Kernel optimization for faster inference

Serving-stack rewriting for higher throughput

Serverless inference infrastructure for open-source LLMs

Dedicated inference infrastructure option

Optimization of code for custom silicon

Agent readiness

12/100

Early

MCP server

Public API

Webhooks

OAuth 2.0

SDKs

No public agent surfaces detected yet.

Funding history

2 · $4.5M

Cumulative raise

From 2025 to 2026 · 2 rounds tracked

Total

$4.5M

Jan 2026 Seed $4M ● Fifty Years

Jan 2025 Seed $500K ● Y Combinator

Capital network

$4.5M raised ·3 backers·10 network links

Backers3
Y CombinatorLead investorLead Fifty YearsLead investorLead Liquid21 round
Shared portfoliocompanies these backers also fund
Moonvalley1 Onyx1 Raycast1 Prosper AI1 Latent1
Extended networkfunds that co-invest alongside them
General Catalyst3 Khosla Ventures3 Andreessen Horowitz2 Accel2 Bessemer Venture Partners1

Key operators

Emilio Andere

Co-Founder & CEO

Steven Arellano

Co-Founder

Alternatives

6 All →

Miro

AI-powered visual collaboration and online whiteboard platform for distributed teams

AI ProductivityAI Design

Webflow

Visual no-code website builder with AI for design, content, and optimization

AI MarketingAI Design

Neural Concept

Swiss CAD-native AI platform that brings deep-learning surrogate models and generative design

AI Design

Lovable

The AI Fullstack Engineer that ships full-stack applications 20x faster than writing code

AI CodingAI Developer Tools

Builder.io

Visual AI platform: design-to-code, AI app building and an agentic CMS

AI CodingAI Design

Quilter

Physics-driven AI platform that fully automates printed circuit board (PCB) layout

AI Design

Frequently asked

What does Wafer do?: Wafer builds autonomous AI agents that act as GPU performance engineers, profiling, diagnosing, and optimizing GPU inference across kernels, models, and production pipelines, and it also provides inference infrastructure.
What infrastructure does it offer?: It offers serverless and dedicated inference infrastructure for open-source LLMs, with optimizations built in to improve speed.
How does it make inference faster?: Its agents optimize the entire stack, including kernel optimization and serving-stack rewriting, to achieve significant speed improvements.
Does it work with custom silicon?: Yes. The company is working with major chip and cloud players to optimize code for custom silicon.

Discussion

Watching

Get Wafer updates

New funding, product launches, and team changes — to your inbox.

Follow startup

Claim ownership

Verify with your work email to manage this listing.

Explore more around Wafer

Contextual paths to related AI startups, deals and rankings.

Similar to Wafer

Country

United States AI startups

Compare

Alternatives

All alternatives to Wafer

Wafer

Claim Wafer

Enter your code

Claim approved

Claim received

Claim Wafer

Enter your code

Claim approved

Claim received

About Wafer

What Wafer does

Key capabilities

Who it's for

Key capabilities

Agent readiness

Funding history

Capital network

Key operators

Emilio Andere

Steven Arellano

Alternatives

Miro

Webflow

Neural Concept

Lovable

Builder.io

Quilter

Frequently asked

Explore more around Wafer

Similar to Wafer

Categories

Country

Compare

Alternatives

Rankings