How does FriendliAI reduce inference costs?

It uses a purpose-built serving engine with iteration-level continuous batching and quantization (FP8, INT8, AWQ) to maximize GPU utilization, which the company says can lower costs by up to 90%.

Can I run FriendliAI in my own infrastructure?

Yes. Friendli Container packages the optimized inference engine so enterprises can deploy it inside their own VPC or on-premises environment.

What models does FriendliAI support?

It serves open-weight text, image, video, and audio models, including models pulled directly from Hugging Face, as well as customer fine-tuned models.

Is there a serverless option?

Yes, Friendli Serverless Endpoints offer instant pay-per-token access to popular open models without managing infrastructure.

Who funded FriendliAI?

Capstone Partners led both its seed and 2025 seed extension, with Sierra Ventures, Alumni Ventures, KDB Investment, and KB Securities participating, for about $25 million total.

Startups AI Infrastructure FriendliAI

FriendliAI

Active

Fast, cost-efficient generative AI inference for any model

📍 Redwood City, United States 📅 Founded 2021 👥 51-200 🏷 AI Infrastructure

Visit website

Total raised

$26M

2 rounds

Stage

Seed

Team

51-200

since 2021

Pricing

Freemium

free plan

Founded

2021

Redwood City, United States

Agent-ready

API

Score 55/100

About FriendliAI

FriendliAI was founded in 2021 by CEO Byung-Gon Chun, a systems researcher at Seoul National University who previously worked on AI infrastructure at Microsoft and Meta. The company set out to solve one of the most expensive problems in applied AI: serving large generative models efficiently in production. Rather than building a foundation model of its own, FriendliAI focuses entirely on the inference layer, where GPU utilization, batching strategy, and memory management determine whether an AI product is economically viable at scale.

The core of the platform is the Friendli Engine, a serving runtime that pioneered iteration-level (continuous) batching and combines it with aggressive but accuracy-preserving quantization, including FP8, INT8, and AWQ. These techniques allow FriendliAI to pack far more concurrent requests onto each GPU than naive serving approaches, which the company says translates into as much as 90% lower inference cost and some of the fastest token-generation speeds on the market. The engine supports text, image, video, and audio generation models, and integrates directly with model hubs like Hugging Face.

FriendliAI offers three main deployment modes: Friendli Serverless Endpoints for instant access to popular open models on a pay-per-token basis, Friendli Dedicated Endpoints for autoscaling private deployments of custom or fine-tuned models, and Friendli Container, which lets enterprises run the optimized engine inside their own VPC or on-premises infrastructure for data residency and compliance reasons.

The company raised an initial $6 million seed round in late 2021 led by Capstone Partners, then a $20 million seed extension in 2025 again led by Capstone, with participation from Sierra Ventures, Alumni Ventures, KDB Investment, and KB Securities, bringing total funding to roughly $25 million. The fresh capital is aimed at expanding its enterprise inference platform and growing its US presence.

FriendliAI competes with a crowded field of inference clouds but differentiates on raw engine performance and flexibility across modalities and deployment surfaces, making it a fit for teams that have outgrown generic API providers and need predictable cost and latency on open-weight models.

Key capabilities

Friendli Engine with iteration-level continuous batching

FP8, INT8, and AWQ quantization for cost and speed

Serverless pay-per-token endpoints for open models

Dedicated autoscaling endpoints for custom models

Friendli Container for VPC and on-premises deployment

Multimodal serving across text, image, video, and audio

Direct Hugging Face model integration

Inference monitoring and observability dashboards

Agent readiness

55/100

Developing

MCP server

Public API

Webhooks

OAuth 2.0

SDKs

Funding history

2 · $26M

— Seed $6M Undisclosed

— Seed Extension $20M Undisclosed

Key operators

Byung-Gon Chun

Founder

Alternatives

6 All →

Nebius

Full-stack AI cloud with large-scale GPU clusters for training and inference

Foundation ModelsAI Infrastructure

Celestial AI

Photonic Fabric optical interconnect for AI infrastructure

AI Infrastructure

d-Matrix

Digital in-memory compute (DIMC) chiplet-based hardware purpose-built for AI inference in the

AI Infrastructure

Chainguard

Secure, minimal container images for software and AI supply chains

AI InfrastructureAI for Cyber Defense

Alcatraz AI

AI facial authentication that replaces the access badge with your face

AI InfrastructureAI for Cyber Defense

Zilliz

Fully managed vector database for AI, built by the creators of Milvus

AI InfrastructureVector Databases

Frequently asked

How does FriendliAI reduce inference costs?: It uses a purpose-built serving engine with iteration-level continuous batching and quantization (FP8, INT8, AWQ) to maximize GPU utilization, which the company says can lower costs by up to 90%.
Can I run FriendliAI in my own infrastructure?: Yes. Friendli Container packages the optimized inference engine so enterprises can deploy it inside their own VPC or on-premises environment.
What models does FriendliAI support?: It serves open-weight text, image, video, and audio models, including models pulled directly from Hugging Face, as well as customer fine-tuned models.
Is there a serverless option?: Yes, Friendli Serverless Endpoints offer instant pay-per-token access to popular open models without managing infrastructure.
Who funded FriendliAI?: Capstone Partners led both its seed and 2025 seed extension, with Sierra Ventures, Alumni Ventures, KDB Investment, and KB Securities participating, for about $25 million total.

Discussion

Watching

Get FriendliAI updates

New funding, product launches, and team changes — to your inbox.

Follow startup

Claim ownership

Verify with your work email to manage this listing.

Explore more around FriendliAI

Contextual paths to related AI startups, deals and rankings.

Similar to FriendliAI

Country

United States AI startups

Compare

Alternatives

All alternatives to FriendliAI

FriendliAI

Claim FriendliAI

Enter your code

Claim approved

Claim received

Claim FriendliAI

Enter your code

Claim approved

Claim received

About FriendliAI

Key capabilities

Agent readiness

Funding history

Key operators

Byung-Gon Chun

Alternatives

Nebius

Celestial AI

d-Matrix

Chainguard

Alcatraz AI

Zilliz

Frequently asked

Explore more around FriendliAI

Similar to FriendliAI

Categories

Country

Compare

Alternatives

Rankings