Skip to main content
The European High Performance Computing Joint Undertaking (EuroHPC JU)

Scaling-Laws of Multimodal Models for Document Understanding

45000 Awarded Resources (in node hours)
MareNostrum5 ACC System Partition
February 2026 - August 2026 Allocation Period

This project aims to establish data-compute-model scaling laws for multimodal systems tailored to document understanding. Current vision-language models exhibit strong general performance yet remain suboptimal on specialized document domains such as maps, comics, and engineering drawings due to mismatched pre-training distributions. By systematically varying training data mixtures, task formulations, and model scales across both pre-training and post-training phases, the study seeks to identify the most effective training recipes for robust document understanding. The resulting scaling laws will enable predictions of performance at larger scales and inform the development of efficient, domain-specialized multimodal architectures.