AI Technology: Natural Language Processing; Deep Learning.
AtomBagFM aims to develop a scalable foundational model for small molecules, addressing a key gap in molecular representation learning.
While existing models excel for biomolecules with linear structures, such as proteins or DNA, small molecules lack a natural sequence, making them challenging to model effectively at scale.
To overcome this, AtomBagFM introduces a hybrid approach that combines the strengths of graph-based and language-based modeling.
Molecules are represented as unordered sets of atoms with structural context, enabling the model to learn chemically meaningful representations without imposing artificial orderings.
Trained on ultra-large molecular datasets, AtomBagFM is designed to support a wide range of downstream applications, including molecular property prediction, generative chemistry, and drug discovery.
The project advances our understanding of molecular representation at scale and contributes to the development of general-purpose AI tools for biomedical research.
Alexis Molina, Nostrum Biodiscovery, Spain