Skip to main content
The European High Performance Computing Joint Undertaking (EuroHPC JU)

AtomBagFM: Scalable Foundational Models for Small Molecules

50000
Awarded Resources (in node hours)
Leonardo BOOSTER
System Partition
June 2025 - May 2026
Allocation Period

AI Technology: Natural Language Processing; Deep Learning.

AtomBagFM aims to develop a scalable foundational model for small molecules, addressing a key gap in molecular representation learning.

While existing models excel for biomolecules with linear structures, such as proteins or DNA, small molecules lack a natural sequence, making them challenging to model effectively at scale.

To overcome this, AtomBagFM introduces a hybrid approach that combines the strengths of graph-based and language-based modeling. 

Molecules are represented as unordered sets of atoms with structural context, enabling the model to learn chemically meaningful representations without imposing artificial orderings.

Trained on ultra-large molecular datasets, AtomBagFM is designed to support a wide range of downstream applications, including molecular property prediction, generative chemistry, and drug discovery. 

The project advances our understanding of molecular representation at scale and contributes to the development of general-purpose AI tools for biomedical research.