This project aims to generate geometrically consistent, explicit, and expandable three-dimensional (3D) scene representations from a single RGB image or limited multi-view inputs. Single-image 3D scene reconstruction is a critical problem in computer vision and graphics; in applications such as virtual and augmented reality, robotic simulation, and autonomous systems, scene information is often limited, and producing accurate 3D representations directly impacts quality, safety, and efficiency. Existing approaches often rely on 2D projections or multi-view inputs and frequently produce geometric inconsistencies in single-image scenarios. This project's work addresses this gap by developing a parametric and differentiable 3D Gaussian-based scene representation, enabling optimization that maintains both geometric and photometric consistency. The proposed method comprises four main components: (i) initial scene representation construction, (ii) geometry-conditioned residual video diffusion for multi-view video synthesis, (iii) learnable camera trajectory optimization, and (iv) iterative scene refinement with integration of external generative models. The video synthesis module refines the scene through residual corrections on existing Gaussian render outputs rather than generating from scratch; super-resolution models enhance fine details, and a feedback mechanism updates scene parameters accordingly. Camera trajectory optimization selects optimal viewpoints according to scene information density and model sensitivity, improving scene coverage and producing more stable video sequences, while enhancing the accuracy of the reconstructed 3D representation. This work represents the first systematic solution that unifies 3D representation, video synthesis, and external model integration under a joint energy function for single-image 3D scene expansion. The outcomes are applicable to digital content creation, virtual and augmented reality, robotic simulation, and autonomous system testing, enhancing consistency, quality, and efficiency in scene generation while providing scientific and technological contributions.
Principal Investigator, Company and Country
Aysegul Dundar, Bilkent University, Turkey