Emergency medicine residents achieved 25% higher diagnostic accuracy on difficult cases when using an interactive AI system that let them query a language model throughout their diagnostic process, according to new research from a multi-institutional team.

The study tested MedSyn, a system that provides physicians with AI assistance while they work through emergency cases. Unlike static AI tools, MedSyn allowed doctors to ask follow-up questions and refine their thinking through dialogue with the AI.

Seven physicians — three senior doctors and four residents — completed diagnostic sessions on 52 emergency cases from the MIMIC-IV database. Each case was stratified by difficulty level to ensure robust testing across different complexity scenarios.

Residents showed the largest gains

Residents saw their accuracy on hard cases jump from 58.9% to 73.4% when using AI assistance. The improvement represented a medium statistical effect with a Cohen's d of 0.47, though the p-value of 0.071 fell just short of traditional significance thresholds.

Automated metrics confirmed the human evaluation results. Standardized accuracy improved by 15.6 percentage points across all participants, with residents showing the largest F1 score gain of 13.8 percentage points.

The dialogue analysis revealed different questioning strategies between experience levels. Senior physicians asked targeted, hypothesis-driven questions to test specific diagnostic theories. Residents relied on broader, more exploratory queries to gather additional information.

Cross-expertise agreement between senior doctors and residents increased by 14.5 percentage points when both groups used AI assistance, suggesting the system helped align diagnostic reasoning across experience levels.

The research addresses a key gap in medical AI evaluation. While many studies test AI systems on static benchmarks, few examine how these tools perform when integrated into actual physician workflows with real-time interaction.

The paper is currently under peer review and represents one of the first controlled studies of interactive AI assistance in emergency medicine diagnostics.