T2ARC - Textual-to-Abstract Reasoning Compression - The European High Performance Computing Joint Undertaking (EuroHPC JU)

38496 Awarded Resources (in node hours)

Leonardo BOOSTER System Partition

July 2026 -January 2027 Allocation Period

Large Reasoning Models (LRMs), such as OpenAI o1 or DeepSeek-R1, can solve complex problems by "thinking" before generating the actual answer, and they decompose the reasoning process by autoregressively generating a Chain of Thought (CoT), i.e., a (textual) sequence of intermediate deduction steps. Although this procedure has recently led to surprising results, the generated CoT are usually very long: thousands or even tens of thousands of tokens. This leads to both a slowdown in LRM response times and, above all, enormous energy consumption. Furthermore, several recent studies show that these very long CoT are often redundant, full of unnecessary details and verbose explanations. Conversely, human reasoning is often more abstract and not always entirely verbal. The purpose of the proposed project T2ARC (Textual-to-Abstract Reasoning Compression) is to develop and test a new method for abstract reasoning with LRMs. Specifically, this project aims to fine-tune existing base LRMs to allow the generation of latent CoT, i.e., a concise sequence of continuous vectors, which represents the deduction steps in an abstract latent space. To do so, inspired by Multimodal LLMs, in which, e.g., visual knowledge is compressed and “translated” in textual knowledge, the team proposes to compress textual CoT and “translate” the textual reasoning level into an abstract, latent reasoning. The project will use a Compressor network for this “translation”, which will allow to represent CoT generated by an LRM using a sequence of embedding vectors. The latter is then used as a target sequence for a Supervised Fine-Tuning stage, followed by a Reinforcement Learning stage. The use of EuroHPC resources is essential for this project, as training involves LLMs with billions of parameters, and the Reinforcement Learning phase (crucial for LRM accuracy) is particularly computationally intensive. T2ARC will release all trained models and developed code and will contribute to strengthening European research into more efficient and sustainable LRMs, in line with European projects such as ELIAS (of which the PI and co-PI of this proposal are members).

Principal Investigator, Company and Country

Enver Sangineto, University of Modena and Reggio Emilia , Italy