A domain-trained small language model has outperformed frontier large language models on legal contract extraction while slashing inference costs by up to 97%, according to new research published on arXiv.

The study tested Olava Extract, a self-hosted legal domain Mixture of Experts model, against five unnamed frontier models on structured contract extraction tasks. Olava Extract achieved a macro F1 score of 0.812 and micro F1 of 0.842, the strongest aggregate performance in the study.

The specialized model also delivered the highest precision scores, producing fewer hallucinated and unsupported extractions than the larger models. This distinction matters in legal workflows where AI hallucinations create operational risk and increase downstream review burden for lawyers.

Cost efficiency drives adoption

Inference costs dropped between 78% and 97% compared to the frontier models tested. The research suggests that high-performing legal AI no longer requires the largest externally hosted models or massive infrastructure spending.

The findings challenge assumptions about enterprise AI capability being tied to ever-larger models and centralized cloud providers. Domain-specific training appears to deliver superior results at a fraction of the cost for specialized use cases like contract analysis.

Legal AI companies like Harvey and Spellbook have raised significant funding to build AI tools for law firms. This research suggests smaller, specialized models could provide a more cost-effective path to deployment.

The paper was authored by researchers Nicole Lincoln, Nick Whitehouse, Jaron Mar, and Rivindu Perera. The study focused specifically on structured data extraction from legal contracts rather than general legal reasoning tasks.

The research indicates that commercially valuable enterprise AI applications may not require the computational resources of frontier models when properly trained on domain-specific data.