Skip to main content
The European High Performance Computing Joint Undertaking (EuroHPC JU)

Nested Matryoshka Clustering for Scalable Visual Representation Learning

200,000
Awarded Resources (in node hours)
Leonardo BOOSTER
System Partition
August 2025 - 6 months
Allocation Period

This project proposes a new vision foundation model that rivals—and often surpasses—the performance of leading proprietary models such as DINOv2, CLIP, and SigLIPv2. The project is built on a fully transparent training pipeline inspired by Web-SSL, using only publicly available datasets such as ImageNet-21K and a subset of ReLAION-2B.

In this EuroHPC project, the researchers will significantly scale up and extend this vision foundation model along three critical axes: (1) high-resolution finetuning to further enhance performance on dense prediction and localization tasks, (2) distillation of large-scale models into smaller, efficient variants to enable deployment in resource-constrained environments, and (3) training a 7B parameter vision foundation model to serve as a highly capable, open-source backbone for multimodal and downstream tasks.

Alongside model development, this project addresses fundamental limitations in self-supervised learning (SSL) clustering methods. Contemporary approaches rely heavily on clustering algorithms like Sinkhorn-Knopp, which ignore the semantic ambiguity present in image representations. To overcome this, the project introduces a parameter-efficient, multi-head clustering projector based on nested Matryoshka representations, enabling progressive feature refinement into increasingly granular clusters without increasing model size. This supports both performance and memory efficiency.

This work sets a new bar for open, reproducible, and high-performance vision foundation models, and aligns with EuroHPC’s mission to support large-scale, cutting-edge AI research that benefits the broader scientific and industrial community.