Wafer builds autonomous AI agents that act as performance engineers, profiling, diagnosing, and optimizing GPU inference across the entire stack from kernels to models to production pipelines. The company provides serverless and dedicated inference infrastructure for open-source LLMs, achieving significant speed improvements through kernel optimization and serving-stack rewriting. It is already working with major chip and cloud players to optimize code for custom silicon. Founded in 2025 by two University of Chicago grads, Wafer was part of YC's Summer 2025 batch.
Wafer
ActiveAI that makes AI fast
Total raised
$4.5M
2 rounds
Stage
Seed
Jan 2026
Team
1-10
since 2025
Pricing
—
Founded
2025
San Francisco, United States
Agent-ready
—
Autonomous AI agents acting as GPU performance engineers
Profiling, diagnosing, and optimizing GPU inference
Optimization across the stack from kernels to models to pipelines
Kernel optimization for faster inference
Serving-stack rewriting for higher throughput
Serverless inference infrastructure for open-source LLMs
Dedicated inference infrastructure option
Optimization of code for custom silicon
12/100
Early
MCP server
Public API
Webhooks
OAuth 2.0
SDKs
No public agent surfaces detected yet.
Cumulative raise
From 2025 to 2026 · 2 rounds tracked
Total
$4.5M
Jan 2026 Seed $4M ● Fifty Years
Jan 2025 Seed $500K ● Y Combinator
Capital network
$4.5M raised ·3 backers·10 network links
- Backers3
- Shared portfoliocompanies these backers also fund
- Extended networkfunds that co-invest alongside them
Meshy
AI 3D model generator for game dev and creators
AI Developer Tools3D Generation
Higharc
AI-powered connected homebuilding platform
AI DesignAI Real Estate
Raspberry AI
AI Design
Scenario
Creative AI infrastructure for generating game-ready art assets at scale
AI DesignAI Gaming
Vizcom
AI Design
qbiq
AI that generates optimized floor plans, 3D tours and CAD models in minutes
AI DesignAI Construction
- What does Wafer do?
- Wafer builds autonomous AI agents that act as GPU performance engineers, profiling, diagnosing, and optimizing GPU inference across kernels, models, and production pipelines, and it also provides inference infrastructure.
- What infrastructure does it offer?
- It offers serverless and dedicated inference infrastructure for open-source LLMs, with optimizations built in to improve speed.
- How does it make inference faster?
- Its agents optimize the entire stack, including kernel optimization and serving-stack rewriting, to achieve significant speed improvements.
- Does it work with custom silicon?
- Yes. The company is working with major chip and cloud players to optimize code for custom silicon.
Discussion
Sign in to join the discussion.
Sign inExplore more around Wafer
Contextual paths to related AI startups, deals and rankings.