Skip to main content
The European High Performance Computing Joint Undertaking (EuroHPC JU)

CAERRA-ML – Development of Machine Learning-based Methods for Downscaling the ERA5 Global Reanalysis to European and Arctic region

90000
Awarded Resources (in node hours)
LUMI-G
System Partition
December 2025 - June 2026
Allocation Period

The CAERRA-ML project will develop a machine-learning (ML) based system for downscaling the ERA5 global reanalysis to provide regional reanalyses for Europe (CERRA) and the Arctic (CARRA). The focus of this project is on developing a novel operational piece that complements existing high-resolution weather reanalysis services. The core objective is to develop and train a set of machine-learning models that downscale the ~30km ERA5 global reanalysis to the ~5km European (CERRA) and ~2.5km Arctic (CARRA-East, CARRA-West) domains. These models will then be deployed to provide rapid estimates of CERRA and CARRA for a subset of essential variables—such as 2-meter temperature and precipitation—bridging the gap between the fast-available but coarse-resolution ERA5T (preliminary version of ERA5) data and the high-resolution data that normally arrives with a two-month delay. Technically, the downscaling system will be developed within the Anemoi framework, a shared collaboration to implement and provide easy access to the best ML practices and tools (Zarr, PyTorch Lightning, Hydra, MLFlow) for meteorological purposes. Our system will use Graph and Graph-Transformer Neural Network architectures trained as a diffusion model and thus capable of generating ensembles and quantifying uncertainty. Two complementary approaches will be tested: an autoregressive method, which integrates information from recent regional analyses to ensure temporal consistency, and a direct downscaling method based solely on ERA5 inputs .To guarantee robustness, the ML-based downscaling system will undergo rigorous evaluation against the original dynamical reanalyses and observational datasets by an independent team. Advanced verification metrics will be applied to assess spatial, temporal, and spectral consistency, as well as the representation of extremes and variability. This will ensure that the final system meets Copernicus quality standards and provides users with a reliable, scientifically sound extension of existing regional reanalyses, with any possible shortcomings inherent to ML systems clearly documented.