This project, "Bio-Plausible Spiking Neural Networks for Energy-Efficient Automatic Speech Recognition," addresses the critical challenge of high-energy consumption in deep learning models for speech processing. While conventional Artificial Neural Networks (ANNs) have achieved state-of-the-art performance in tasks like Automatic Speech Recognition (ASR), their massive computational cost and power draw make them unsuitable for deployment on power-constrained edge devices. The project aims to bridge the performance gap between ANNs and power-efficient Spiking Neural Networks (SNNs). SNNs communicate with sparse, binary spikes, enabling-magnitudes-of-order lower energy consumption at inference, especially on specialized neuromorphic hardware where multiplications are reduced to simple additions. The team's core methodology involves developing novel hybrid ANN-SNN architectures. The team will leverage the powerful feature-extraction capabilities of pre-trained ANNs (e.g., Whisper) while replacing power-hungry components (like transformer decoders) with simpler, recurrent Leaky Integrate-and-Fire (LIF) SNNs for decoding. This approach contrasts with existing SNN research that often relies on complex, custom neurons, whereas this project will focus on simpler, more generalizable architectures using tools like PyTorch, SpeechBrain, and snnTorch. The expected impact is twofold:
- Technological: This project will deliver a framework for high-performance, low-power ASR, enabling the deployment of advanced speech models directly on edge devices. This supports data privacy and sovereignty, aligning with the principles of the EU AI Act by eliminating the need to send sensitive data to a server.
- Scientific: By analysing the behavioural dynamics of these hybrid networks, in collaboration with neuroscience experts, the project will bring a deeper, more explainable understanding of SNN computation and its parallels to biological neural processing. Access to EuroHPC GPU resources is essential to explore the vast architectural search space and manage the sequential nature of SNN training on large-scale datasets like LibriSpeech, a task unfeasible on local clusters.
Themos Stafylakis , Athens University of Economics and Business, Greece