Foundation Models for Clinical Brain MRI: Large-Scale Multi-Sequence Pretraining with Patient Metadata

38899 Awarded Resources (in node hours)

MeluXina GPU System Partition

July 2026 -January 2027 Allocation Period

This project will develop the first large-scale foundation models that learn from complete MRI sessions as they occur in hospitals: multiple imaging sequences at their native resolutions, integrated with clinical context such as patient demographics and scanner protocols. Using a uniquely assembled dataset of ~2,700,000 brain MRI scans from ~1,000 public sources globally, including diverse MRI sequences and covering multiple pathologies—the largest and most diverse collection to date—the team adapts modern transformer architectures to handle full-session, clinically contextualized 3D MRI data. Unlike current models, which are trained on ~100,000 scans, typically single-sequence, i.i.d., and heavily resampled, this approach captures the full complexity of clinical MRI. The resulting models will be evaluated on 30+ medical tasks, far exceeding the handful used in previous studies. All models will be released under permissive open licenses, lowering computational and data barriers for hospitals and researchers while enabling robust, generalizable, and clinically meaningful tools.

Principal Investigator, Company and Country

Mads Nielsen, University of Copenhagen, Denmark