Enhancing Medical RAG with Document-Level Information Extraction for Robust Clinical Reasoning

85549 Awarded Resources (in node hours)

Leonardo BOOSTER System Partition

May 2026 - November 2026 Allocation Period

Modern clinical reasoning depends on accessing and synthesizing vast amounts of biomedical knowledge that far exceed human cognitive capacity, making robust retrieval and inference crucial for applications such as early disease detection.

Although Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm, current retrieval mechanisms degrade at scale, often failing to surface critical evidence and compromising downstream reasoning.

The team proposes a strengthened RAG framework centered on document-level information extraction (IE), which builds structured, cross-document knowledge representations to enable more reliable retrieval and multi-hop inference over large medical corpora.

The study's approach advances document-level IE through explicit reasoning strategies and improved test-time scaling, while systematically exploring how structured knowledge can be effectively integrated into LLMs.

Overall, the project seeks to deliver significantly enhanced biomedical IE and retrieval capabilities, with direct contributions to the Horizon Europe ECHOLOT project.

Principal Investigator, Institutions and Country

Oier Lopez de Lacalle, University of the Basque Country, Spain