Scalable and Green Multimodal Alignment: Advancing MoReS for Low-Carbon Visual Language Models

50000

Awarded Resources (in node hours)

Leonardo BOOSTER

System Partition

June 2025 - May 2026

Allocation Period

AI Technology: Natural Language Processing; Vision (image recognition, image generation, text recognition OCR, etc.).

This project aims to advance the frontier of efficient multimodal alignment through the continued development of Modality Linear Representation-Steering (MoReS)—a lightweight, scalable fine-tuning framework for visual instruction tuning in Multimodal Large Language Models (MLLMs).

MoReS has already demonstrated the ability to match state-of-the-art performance while using over 500× fewer trainable parameters than conventional methods, presenting a promising path toward eco-efficient, high-performing MLLMs.

To scale this approach responsibly, the study proposes three research directions:

Non-linear and compositional representation steering to capture complex cross-modal semantics without increasing model size;
Cross-task and cross-model generalization, validating the robustness of MoReS across domains while minimizing redundant training cycles;
The development of GreenMLLM-Bench, a new suite of alignment diagnostics and token-level attribution tools to evaluate performance and energy efficiency jointly.

The research team's work is directly aligned with global efforts toward Green AI and sustainable computing.

Access to GPUs will allow us to perform controlled ablation studies, scaling experiments, and visual diagnostics under low-resource fine-tuning regimes, ultimately contributing to the design of low-carbon MLLMs for real-world deployment.