Researchers have developed a specialized LLM serving stack designed specifically for fraud detection and anti-money laundering (AML) compliance workloads, achieving dramatic performance improvements over generic chat-optimized systems.

The research, published by Prathamesh Vasudeo Naik and colleagues, addresses a critical gap in how financial institutions deploy large language models for regulatory compliance tasks.

Compliance prompts differ significantly from typical chatbot interactions. They combine reusable policy instructions, risk taxonomies, transaction evidence, and require structured JSON outputs rather than conversational responses.

Performance gains through workload optimization

The specialized stack improved throughput from 612-650 requests per hour to 3,600 requests per hour across public synthetic AML datasets. P99 latency dropped from 31-38 seconds to 6.4-8.7 seconds, while GPU utilization increased from 12% to 78%.

The architecture combines vLLM-style runtime tuning with PagedAttention, automatic prefix caching, and multi-adapter serving. It includes adapter and prompt-length-aware batching, sleep/wake lifecycle management, and speculative decoding.

The system uses self-hosted open-weight models including Meta Llama and Alibaba Qwen rather than proprietary APIs to avoid exposing sensitive financial data.

Quality assurance for regulated environments

The researchers incorporated an LLM-as-judge quality gate using deterministic compliance checks and expert-adjudicated calibration data. This addresses the critical need for explainable and auditable AI decisions in regulated financial environments.

The reproducibility track converts public synthetic AML datasets, including IBM AML and SAML-D, into prefix-heavy compliance prompts with reusable policy text and schema-constrained outputs.

The work demonstrates that regulated LLM performance requires workload-specific optimization beyond model selection, particularly for prefix-heavy, evidence-rich compliance tasks that dominate financial services AI applications.