Skip to main content
The European High Performance Computing Joint Undertaking (EuroHPC JU)

Slovene Sovereign AI – Development of Large Language Model for Slovene

40000
Awarded Resources (in node hours)
Leonardo BOOSTER
System Partition
December 2025 - June 2026
Allocation Period

Extremely large language models (LLMs), such as GPT and Gemini, have shown remarkable advances for some tasks and triggered an avalanche of developments in artificial intelligence applications. Unfortunately, due to their closed and non-transparent nature, high computational requirements, and high cost of customization, they are out of reach for most research organizations and companies. Their open-source alternatives, such as Gemma and LLaMa, achieve almost the same performance on English. However, their knowledge of less-resourced languages and cultures (such as Slovene) is still superficial. This project aims to to develop the next generation of an open-source model for Slovene called GaMS. The model will be the basis for further adaptations to specific application needs and will also be available for wider academic and industrial use. The model will be obtained by continued pretraining of Gemma 3 open model on more than 100 B tokens, followed by instruction-tuning on over 300k high-quality examples. The project is technically supported by NVIDIA as a part of their Sovereign AI initiative.