Groq is an AI inference company founded in 2016 by Jonathan Ross, who previously helped create Google's Tensor Processing Unit (TPU). The company is headquartered in Mountain View, California, and specializes in high-speed, low-cost AI model inference.
Groq's core innovation is the Language Processing Unit (LPU), a custom processor architecture designed specifically for deterministic, low-latency inference rather than general-purpose training. The company delivers this capability through GroqCloud, a developer platform offering fast access to popular open models via API.
Groq has raised significant venture funding, reaching a multibillion-dollar valuation with investors including BlackRock, Cisco, Samsung Catalyst Fund, and others, alongside large infrastructure commitments to expand inference capacity globally. It has also announced major regional data center partnerships.
Groq differentiates itself through its single-core, software-scheduled LPU design, which it positions as delivering substantially higher token throughput and lower latency than GPU-based inference for many language workloads. Its emphasis on speed and predictable performance targets latency-sensitive AI applications.
The company serves developers and enterprises building real-time AI products such as chat assistants, agents, and voice applications where response speed is critical. Groq competes with GPU cloud providers and other inference-specialized startups in a rapidly expanding market for cost-efficient model serving.