RegionReasoner: Region-Grounded Multi-Round Visual Reasoning

60000

Awarded Resources (in node hours)

Leonardo BOOSTER

System Partition

December 2025 - June 2026

Allocation Period

This project proposes RegionReasoner, a reinforcement learning framework for region-grounded multi-round visual reasoning. To support systematic evaluation, the project introduces RegionDial-Bench, a benchmark constructed from RefCOCO+ and RefCOCOg with ~10k multi-turn dialogues. Preliminary experiments with a 3B model on a subset of the data demonstrate the promise of our approach. The next step is to validate RegionReasoner’s core innovations—explicit reference citation, global–local semantic consistency, and structured multi-round reasoning—on the full dataset using a 7B model. This requires large-scale training resources, specifically 8×A100 GPUs, to enable reinforcement learning optimization at sufficient scale. The requested EuroHPC allocation will make it possible to rigorously test RegionReasoner’s contributions and establish a strong baseline for multi-round, reference-grounded visual reasoning.

Principal Investigator, Research Team Institution & Country