Multi-Teacher Knowledge Distillation for Lightweight Speech Interaction in Embedded Educational Robots
Abstract
Educational robots have significant potential in improving learning experience and efficiency through their natural real-time voice interaction capabilities. However, existing mainstream end-to-end voice interaction models have problems with large parameter quantities and high computational costs, making it difficult to deploy efficiently on resource limited embedded educational robot platforms. The average inference delay is 810ms, which seriously affects real-time interaction. Moreover, traditional compression methods sacrifice understanding accuracy in complex scenarios, and the representation ability of small-scale models is limited; To this end, this study proposes a method for constructing a lightweight speech interaction system based on knowledge distillation. A deep neural network pre trained on a large-scale general corpus is used as the teacher model, and a multi-level knowledge transfer mechanism is established through differential masking to guide key feature learning, relationship information extraction module to obtain global correlations, and hierarchical loss function to balance distillation weights. The core knowledge of the teacher model is extracted into a lightweight student model driven by educational scenarios. The final student model contains only 20% of the parameters of the teacher model and maintains high accuracy on a benchmark test set simulating real educational environments. The speech recognition error rate is as low as 15.8% (12.6 percentage points lower than directly training small models of the same scale), and the inference delay is reduced from 810ms to 500ms By reducing by 38% and breaking through the real-time threshold of educational human-computer interaction, the model storage space has been compressed by over 80% (<350MB). It can run efficiently on low-power hardware platforms, effectively solving the balance between accuracy and efficiency in educational robot voice interaction, improving real-time interaction, robustness, and practicality, and providing reliable technical support for its wide application in various educational scenarios.DOI:
https://doi.org/10.31449/inf.v50i5.12129Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







