Skip to main content
The European High Performance Computing Joint Undertaking (EuroHPC JU)

3D-Consistent Expandable Scene Generation from a Single Image

This project aims to generate geometrically consistent, explicit, and expandable three-dimensional (3D) scene representations from a single RGB image or limited multi-view inputs.

76000 Awarded Resources (in node hours)
Leonardo BOOSTER System Partition
July 2026 -January 2027 Allocation Period

This project aims to generate geometrically consistent, explicit, and expandable three-dimensional (3D) scene representations from a single RGB image or limited multi-view inputs. Single-image 3D scene reconstruction is a critical problem in computer vision and graphics; in applications such as virtual and augmented reality, robotic simulation, and autonomous systems, scene information is often limited, and producing accurate 3D representations directly impacts quality, safety, and efficiency. Existing approaches often rely on 2D projections or multi-view inputs and frequently produce geometric inconsistencies in single-image scenarios. This project's work addresses this gap by developing a parametric and differentiable 3D Gaussian-based scene representation, enabling optimization that maintains both geometric and photometric consistency. The proposed method comprises four main components: (i) initial scene representation construction, (ii) geometry-conditioned residual video diffusion for multi-view video synthesis, (iii) learnable camera trajectory optimization, and (iv) iterative scene refinement with integration of external generative models. The video synthesis module refines the scene through residual corrections on existing Gaussian render outputs rather than generating from scratch; super-resolution models enhance fine details, and a feedback mechanism updates scene parameters accordingly. Camera trajectory optimization selects optimal viewpoints according to scene information density and model sensitivity, improving scene coverage and producing more stable video sequences, while enhancing the accuracy of the reconstructed 3D representation. This work represents the first systematic solution that unifies 3D representation, video synthesis, and external model integration under a joint energy function for single-image 3D scene expansion. The outcomes are applicable to digital content creation, virtual and augmented reality, robotic simulation, and autonomous system testing, enhancing consistency, quality, and efficiency in scene generation while providing scientific and technological contributions.

Principal Investigator, Company and Country

Aysegul Dundar, Bilkent University,  Turkey