Next generation models for AI-Driven Drug Design

50000

Awarded Resources (in node hours)

LUMI-G

System Partition

June 2025 - May 2026

Allocation Period

AI Technology: Machine Learning & Deep Learning

Generative chemical models – AI systems that propose novel molecules – have emerged as practical assets in drug discovery, enabling multi-objective molecular optimization and accelerating preclinical discovery.

AstraZeneca's REINVENT models, developed in Sweden and released publicly, exemplify this transition, discovering novel molecules against various drug targets in industry and academia.

Initially released in 2018, the classic REINVENT framework consists of a pre-trained chemical language model and a reinforcement learning system to tailor molecular properties to specific drug targets.

The model remains competitive to this day, and the ecosystem has since expanded to cover other drug design-related tasks. However, challenges such as data quality, synthetic feasibility, and generation reliability persist.

This project aims to explore and refine methods for the next-generation models for drug discovery, focusing on three topics: training new foundational generalized language models for multiple molecular design tasks, using reinforcement learning to fine-tune 3D generative flow-matching models, and investigating the impact of data size and diversity on pre-training molecular generators.

These efforts will contribute to improving the power, efficiency and reliability of open-source AI models for drug design.