AI Technology: Vision (image recognition, image generation, text recognition OCR, etc.); Generative Language Modeling
The team proposes to advance pixel-level generative models that generate images directly in the pixel domain, targeting high-fidelity details and scalable training.
While latent generative models offer efficiency, pixel-space modeling avoids decoding bottlenecks and preserves fine structure, but is limited by long sequences and training instability.
Using EuroHPC resources, we will train and benchmark pixel-level diffusion-based baselines (e.g., PixelDiT, JiT-style training) and develop improved training/evaluation protocols/AR paradigm through large-scale ablations across resolutions and model sizes, enabling reproducible, open research on scalable pixel generation.
Principal Investigator, Institution and Country
Matthew Blaschko, KU Leuven, Belgium