Skip to main content
The European High Performance Computing Joint Undertaking (EuroHPC JU)

Multimodal Foundation Model for Structure-Aware Variant Effect Prediction

42000 Awarded Resources (in node hours)
Meluxina System Partition
February 2026 - August 2026 Allocation Period

Interpreting the functional consequences of amino-acid substitutions remains a major challenge in genomics and molecular biology, particularly for variants of uncertain significance (VUS). Although recent protein language models (pLLMs) such as ESM-2 and ProtT5 have advanced sequence-based representation learning, their holistic sequence processing limits sensitivity to the local structural perturbations induced by single-residue mutations. PIRATE addresses this gap by developing a multimodal, mutation-aware foundation model that integrates protein sequence, 3Di-derived structural descriptors, and explicit mutation embeddings within a unified transformer architecture. The model is pre-trained using a contrastive learning framework that captures fine-grained relationships between sequence, mutation context, and three-dimensional structural divergence, leveraging large-scale datasets of experimentally resolved protein structures. After pre-training, PIRATE will be fine-tuned on diverse Deep Mutational Scanning (DMS) datasets to produce quantitative, phenotype-specific predictions of variant effects on protein stability, catalytic activity, and molecular interactions. Performance will be rigorously evaluated against state-of-the-art approaches, including EVE, AlphaMissense, and current ESM-based predictors. By combining structural information with mutation-aware modelling at scale, PIRATE aims to deliver a significantly more accurate and biologically grounded variant effect predictor, supporting advances in biomedical research, precision diagnostics, and large-scale functional genomics.