The lab that started the open LLM movement
EleutherAI began in July 2020 as a Discord collective of independent researchers determined to replicate GPT-3 in the open. Founded by Connor Leahy, Leo Gao and Sid Black, it released The Pile — a landmark 825GB training dataset — followed by GPT-Neo, GPT-J-6B and GPT-NeoX-20B, which were the largest open-source GPT-style models in the world at release. In early 2023 the collective incorporated as the non-profit EleutherAI Institute, led by executive director Stella Biderman, employing over twenty full-time researchers.
Research focus
Beyond model training, EleutherAI produces widely used infrastructure such as the LM Evaluation Harness (the backbone of many public leaderboards) and the Pythia model suite for studying training dynamics. Current work spans interpretability, alignment, eliciting latent knowledge, and openly licensed datasets like Common Pile.
Why it matters
EleutherAI's artifacts underpin thousands of academic papers and commercial systems, and its donor-funded model — supported by CoreWeave, Hugging Face, Stability AI and others — proves that frontier-adjacent research can happen fully in the open.