Multimodal LLMs for Low-Resource Languages

65000 Awarded Resources (in node hours)

Leonardo BOOSTER System Partition

February 2026 - August 2026 Allocation Period

AI Technology: Generative Language Modeling; Vision (image recognition, image generation, text recognition OCR, etc.).

Motivated by the needs of the Horizon Europe project LUMINOUS, the team proposes a novel way to develop Multimodal LLMs for low-resource languages (LRL), adapting a strong and open English-centric MLLM.

This approach aims at preserving the multiple multimodal skills acquired by such models and transfer them to LRLs under different data availability scenarios, an approach that has not been explored by previous work.

The team will focus on four European LRLs, namely Basque, Norwegian, Catalan and Galician, ensuring that our methodology can be generalized to different LRLs.

As a result, we will produce specific MLLMs for the target LRLs, but more importantly, we will contribute a generic methodology to adapt strong MLLMs to LRLs.

Principal Investigator, Affiliation and Country

Gorka Azkune, University of the Basque Country, Spain