Pleias is a Paris-based AI lab founded in 2024 that builds open datasets, synthetic data pipelines and small specialized reasoning models trained entirely on rights-cleared data.

What is Common Corpus?

The largest openly licensed multilingual dataset for LLM pretraining (~2 trillion tokens), created by Pleias and used by labs including Anthropic, IBM and StepFun.

Pierre-Carl Langlais, Anastasia Stasenko and Ivan Yamshchikov founded the company in Paris in 2024.

Are Pleias models open?

Yes, Pleias releases its models and datasets openly on Hugging Face, alongside commercial enterprise offerings.

Why do small models matter here?

Pleias showed that a 600M-parameter model trained on curated domain data can outperform far larger closed models on specialized tasks at a fraction of the cost.

Startups Foundation Models Pleias

Pleias

Active

Paris lab training small open models on fully open, rights-cleared data

📍 Paris 📅 Founded 2024 🏷 Foundation Models

Visit website

Total raised

—

Stage

—

Team

—

Pricing

Enterprise

free plan

Founded

2024

Paris

Agent-ready

—

About Pleias

Ethical data as a foundation

Pleias was founded in Paris in 2024 by Pierre-Carl Langlais, Anastasia Stasenko and Ivan Yamshchikov, and operates out of Station F. The lab's core bet is that the data layer — not raw scale — determines model quality. Its Common Corpus is the largest openly licensed multilingual dataset for LLM pretraining (~2 trillion tokens), accepted at ICLR, and reused by organizations including Anthropic, IBM, StepFun and Elastic.

Small models, synthetic data

Pleias pretrains compact reasoning models entirely on open and synthetic data, including the Pleias-RAG family of small retrieval-augmented reasoners that quote their sources. Its SYNTH pipeline generates fully autonomous synthetic pretraining datasets, while products like Synth (agent training data), Stratum (document structuring) and Common Corpus power enterprise deployments. A 600M-parameter model trained for Paris transit operator RATP reportedly outperformed closed models 200x its size on domain tasks.

European open AI

Collaborating with NVIDIA, Mozilla and the Wikimedia Foundation, Pleias deploys on-premise models in regulated sectors — transport, banking, telecom, energy and healthcare — making it a distinctive European voice for provably clean, open AI training.

Key capabilities

Common Corpus: ~2T-token rights-cleared multilingual pretraining dataset

SYNTH fully autonomous synthetic pretraining data pipeline

Pleias-RAG small reasoning models that cite their sources

Synth product for expert-level agent training data

Stratum document-to-structured-data conversion for agentic workflows

On-premise deployment for regulated industries

Agent readiness

0/100

Early

MCP server

Public API

Webhooks

OAuth 2.0

SDKs

No public agent surfaces detected yet.

Key operators

Anastasia Stasenko

co-founder & ceo

Ivan Yamshchikov

co-founder

Pierre-Carl Langlais

co-founder

Alternatives

6 All →

Thinking Machines Lab

Frontier AI research lab building customizable, multimodal models

AI Developer ToolsFoundation Models

AMI Labs

Yann LeCun's lab building foundational world models for real-world AI

Foundation ModelsResearch Assistants

Anthropic

AI safety lab building Claude — a helpful, harmless, honest AI assistant.

AI ChatbotsFoundation Models

OpenAI

Creator of ChatGPT, GPT-4, and the leading frontier AI lab.

AI ChatbotsAI Developer Tools

Perplexity

AI-powered answer engine delivering real-time, cited responses to complex queries.

AI SearchAI Productivity

Reflection AI

America's open frontier AI lab building autonomous coding agents

Foundation ModelsOpen Source AI

Frequently asked

What is Pleias?: Pleias is a Paris-based AI lab founded in 2024 that builds open datasets, synthetic data pipelines and small specialized reasoning models trained entirely on rights-cleared data.
What is Common Corpus?: The largest openly licensed multilingual dataset for LLM pretraining (~2 trillion tokens), created by Pleias and used by labs including Anthropic, IBM and StepFun.
Who founded Pleias?: Pierre-Carl Langlais, Anastasia Stasenko and Ivan Yamshchikov founded the company in Paris in 2024.
Are Pleias models open?: Yes, Pleias releases its models and datasets openly on Hugging Face, alongside commercial enterprise offerings.
Why do small models matter here?: Pleias showed that a 600M-parameter model trained on curated domain data can outperform far larger closed models on specialized tasks at a fraction of the cost.

Discussion

Watching

Get Pleias updates

New funding, product launches, and team changes — to your inbox.

Follow startup

Claim ownership

Verify with your work email to manage this listing.

Explore more around Pleias

Contextual paths to related AI startups, deals and rankings.

Similar to Pleias

Compare

Alternatives

All alternatives to Pleias

Pleias

Claim Pleias

Enter your code

Claim approved

Claim received

Claim Pleias

Enter your code

Claim approved

Claim received

About Pleias

Ethical data as a foundation

Small models, synthetic data

European open AI

Key capabilities

Agent readiness

Key operators

Anastasia Stasenko

Ivan Yamshchikov

Pierre-Carl Langlais

Alternatives

Thinking Machines Lab

AMI Labs

Anthropic

OpenAI

Perplexity

Reflection AI

Frequently asked

Explore more around Pleias

Similar to Pleias

Categories

Compare

Alternatives

Rankings