Inside Amazon's Trainium Lab That Won Over OpenAI
The Austin facility where a team born from a $350 million acquisition now builds the chips powering two of the world's biggest AI labs.

Amazon recently opened the doors of its secretive Trainium chip lab in Austin, Texas, offering a rare look at the silicon that has attracted both Anthropic and OpenAI as committed customers. The tour came days after CEO Andy Jassy announced a $50 billion investment deal with OpenAI, a partnership that will deliver 2 gigawatts of Trainium computing capacity to the model maker. With 1.4 million Trainium chips already deployed across three generations, Amazon is making the most aggressive play yet to challenge Nvidia's grip on AI compute.
The Chip Nvidia Should Worry About
Amazon's pitch is simple but potent. Its latest Trainium3, released in December 2025, runs on servers that cost up to 50% less than classic cloud alternatives for comparable performance. Fabricated by TSMC on a 3-nanometer process, each Trainium3 chip delivers 2.52 petaflops of FP8 compute with 144 GB of HBM3e memory and 4.9 terabytes per second of bandwidth. A single Trn3 UltraServer packs 144 of these chips together, delivering 362 petaflops total.
New custom Neuron switches designed by the same team allow every Trainium3 chip to communicate with every other chip in a mesh configuration, slashing latency. “That's why Trainium3 is breaking all kinds of records,” said Mark Carroll, director of engineering at the lab, pointing to gains in “price per power” that compound rapidly when trillions of tokens per day are involved.
Where Anthropic's Claude Actually Lives
The biggest single deployment of Trainium sits inside Project Rainier, one of the world's largest AI compute clusters. It went live in late 2025 with 500,000 Trainium2 chips dedicated to Anthropic. In total, Anthropic's Claude runs on more than 1 million Trainium2 chips. Trainium2 also handles the majority of inference traffic on Amazon Bedrock, the platform that enterprise customers use to build AI applications with multiple models.
“Our customer base is just expanding as fast as we can get capacity out there,” said Kristopher King, the lab's director. He added that “Bedrock could be as big as EC2 one day,” comparing the AI platform's growth potential to AWS's flagship compute service.
OpenAI Joins the Trainium Bet
The $50 billion Amazon investment in OpenAI, announced in February 2026, includes an initial $15 billion outlay with the remaining $35 billion contingent on conditions that may include an OpenAI IPO. The deal makes AWS the exclusive cloud provider for OpenAI Frontier, the company's AI agent builder. It also expands a prior $38 billion compute agreement by $100 billion over eight years.
OpenAI's commitment to consume 2 gigawatts of Trainium capacity is what makes this deal significant for the chip team. That commitment spans both Trainium3 and its successor, Trainium4, which engineers at the Austin lab are already designing. Microsoft, OpenAI's longtime partner, may believe the Amazon deal violates its own agreement with OpenAI, according to the Financial Times.
Cerebras and the Inference Speed Race
AWS announced a partnership with Cerebras Systems in March 2026 to build what it calls the fastest inference solution available through Amazon Bedrock. The system splits inference into two stages. Trainium handles prompt processing, where parallelism matters most, while the Cerebras CS-3 system generates output tokens. Amazon's Elastic Fabric Adapter networking ties the two together.
“Inference is where AI delivers real value to customers, but speed remains a critical bottleneck,” said David Brown, Vice President of Compute and ML Services at AWS. The hybrid approach targets demanding workloads like real-time coding assistance and interactive applications.
A Pizza Party and a Grinder at 3 AM
The Trainium lab occupies the back of a high floor in a chrome-windowed building in Austin's Domain district. It looks like a cross between a shop class and a Hollywood set for a high-end facility. Shelves of equipment fill a space roughly the size of two large conference rooms, and the noise from cooling fans is constant.
The team traces its roots to Annapurna Labs, an Israeli chip designer Amazon acquired in January 2015 for roughly $350 million. More than a decade later, the Annapurna name and logo remain everywhere in the office.
The most intense moments come during “bring-up,” the overnight marathon when a chip is powered on for the first time after 18 months of design work. For Trainium3, the heat sink dimensions on the air-cooled prototype were off. The team grabbed a grinder and started shaving metal in a conference room to avoid disrupting the pizza party atmosphere. Engineers work around the clock for three to four weeks during these events, racing to prove the silicon works before mass production begins.
Why Switching from Nvidia Just Got Easier?
Amazon has tackled Nvidia's strongest competitive moat by adding full PyTorch support to Trainium. Carroll told TechCrunch the transition requires “basically a one-line change, and then recompile, and then run on Trainium”. That compatibility extends to many open source models hosted on Hugging Face.
Apple validated the team's work back in 2024 when its director of AI publicly described using Graviton, the low-power ARM server CPU that was the lab's first breakout product. Andy Jassy has called Trainium a multibillion-dollar business for AWS and singled it out as one piece of AWS technology he is most excited about.
“So far, we've been doing really well,” Carroll said. With two of the world's leading AI labs now committed and a fourth chip generation underway, the pressure on this Austin lab will only grow.
FAQs
What is Amazon Trainium and what does it do?
Amazon Trainium is a custom AI chip designed by AWS for both training and running AI models. It is manufactured by TSMC on a 3-nanometer process and is positioned as a lower-cost alternative to Nvidia GPUs, with the latest Trainium3 delivering 2.52 petaflops of FP8 compute per chip.
How does Trainium3 compare to Nvidia GPUs on cost?
Amazon claims Trainium3 running on its Trn3 UltraServers costs up to 50% less for comparable performance versus classic cloud GPU servers. The chip also delivers 4.4x more compute performance and 4x greater energy efficiency than Trainium2.
Why did OpenAI agree to use Amazon Trainium instead of sticking with Nvidia?
OpenAI's deal with Amazon includes a $50 billion investment and access to 2 gigawatts of Trainium capacity across Trainium3 and Trainium4. The cost savings at scale and AWS's exclusive distribution of OpenAI's Frontier agent platform made the partnership commercially attractive.
What is Project Rainier?
Project Rainier is one of the world's largest AI compute clusters, built by AWS with 500,000 Trainium2 chips. It went live in late 2025 and is used exclusively by Anthropic for training and deploying its Claude models.
Can developers easily switch from Nvidia to Trainium?
Yes. Amazon says Trainium now supports PyTorch, which means developers can port many existing AI models with minimal code changes. The engineering team described the migration as requiring “basically a one-line change, and then recompile”.
What is the AWS and Cerebras partnership about?
Announced in March 2026, the partnership combines AWS Trainium for prompt processing with the Cerebras CS-3 system for token generation, connected via Elastic Fabric Adapter networking. The goal is to deliver the fastest inference available through Amazon Bedrock.
Sources
Topics
Amazon Trainium
AWS AI Chips
OpenAI
Anthropic Claude
Nvidia Alternative
AI Inference
Cloud Computing
Cerebras Systems
Amazon Trainium
Latest Generation
Trainium3 (3nm)
FP8 Per Chip
2.52 petaflops
Memory Per Chip
144 GB HBM3e
Chips Deployed
1.4 million total
UltraServer Peak
362 petaflops
Cost vs. GPUs
Up to 50% less
✉️ Daily AI Digest
Get the day's most important AI stories in one sharp email. Join 42,800+ readers.
Free forever · No spam
🔥 Trending Now






