Researchers Build Compliance-Grade LLM Stack for Fraud

Researchers have developed a specialized LLM serving stack designed specifically for fraud detection and anti-money laundering (AML) compliance workloads, achieving dramatic performance improvements over generic chat-optimized systems.

The research, published by Prathamesh Vasudeo Naik and colleagues, addresses a critical gap in how financial institutions deploy large language models for regulatory compliance tasks.

Compliance prompts differ significantly from typical chatbot interactions. They combine reusable policy instructions, risk taxonomies, transaction evidence, and require structured JSON outputs rather than conversational responses.

Performance gains through workload optimization

The specialized stack improved throughput from 612-650 requests per hour to 3,600 requests per hour across public synthetic AML datasets. P99 latency dropped from 31-38 seconds to 6.4-8.7 seconds, while GPU utilization increased from 12% to 78%.

The architecture combines vLLM-style runtime tuning with PagedAttention, automatic prefix caching, and multi-adapter serving. It includes adapter and prompt-length-aware batching, sleep/wake lifecycle management, and speculative decoding.

The system uses self-hosted open-weight models including Meta Llama and Alibaba Qwen rather than proprietary APIs to avoid exposing sensitive financial data.

Quality assurance for regulated environments

The researchers incorporated an LLM-as-judge quality gate using deterministic compliance checks and expert-adjudicated calibration data. This addresses the critical need for explainable and auditable AI decisions in regulated financial environments.

The reproducibility track converts public synthetic AML datasets, including IBM AML and SAML-D, into prefix-heavy compliance prompts with reusable policy text and schema-constrained outputs.

The work demonstrates that regulated LLM performance requires workload-specific optimization beyond model selection, particularly for prefix-heavy, evidence-rich compliance tasks that dominate financial services AI applications.

Researchers Build Compliance-Grade LLM Stack for Fraud Detection and AML

Performance gains through workload optimization

Quality assurance for regulated environments

Related reading

Higgsfield Launches "Supercomputer" — One Chat, 30+ AI Models, Zero Tool-Hopping

Anthropic Economic Index shows Claude usage diversifying across lower-wage tasks

Vision-Language Models Show Systematic Bias From Embedded Numbers in Images

💬 Discussion