This project proposes RegionReasoner, a reinforcement learning framework for region-grounded multi-round visual reasoning. To support systematic evaluation, the project introduces RegionDial-Bench, a benchmark constructed from RefCOCO+ and RefCOCOg with ~10k multi-turn dialogues. Preliminary experiments with a 3B model on a subset of the data demonstrate the promise of our approach. The next step is to validate RegionReasoner’s core innovations—explicit reference citation, global–local semantic consistency, and structured multi-round reasoning—on the full dataset using a 7B model. This requires large-scale training resources, specifically 8×A100 GPUs, to enable reinforcement learning optimization at sufficient scale. The requested EuroHPC allocation will make it possible to rigorously test RegionReasoner’s contributions and establish a strong baseline for multi-round, reference-grounded visual reasoning.
Cees Snoek , University of Amsterdam, Netherlands