5G-Optimized Deep Learning Framework for Real-Time Multilingual Speech-to-Speech Translation in Telemedicine Systems
Abstract
Telemedicine has revolutionized healthcare by enabling virtual consultations, yet it still faces challenges from linguistic barriers and the need for real-time, scalable communication. Current systems typically address isolated tasks like speech recognition or symptom classification, lacking a unified solution for multilingual doctor-patient interactions. To address this, we present a 5g-optimized Deep Learning Framework that integrates advanced speech recognition, neural machine translation, and text-to-speech synthesis into a seamless Speech-to-Speech Workflow (STSW). Specifically, our framework utilizes finetuned OpenAI Whisper for speech recognition, a Marian MT model fine-tuned on multilingual medical corpora for translation, and Tacotron 2-based neural TTS for speech synthesis. Each model is domainadapted to handle complex medical terminologies. We implement the framework over 5G-enabled edge computing infrastructure, ensuring real-time performance with ultra-low latency. Experimental results demonstrate the effectiveness of the proposed system, achieving a Word Error Rate (WER) of 0.12, a BLEU score of 0.85 for translation quality, and a Mean Opinion Score (MOS) of 4.5 for the naturalness of synthesized speech. Furthermore, our framework delivers an end-to-end latency of 2.1 seconds, outperforming existing approaches. This integration bridges communication gaps in telemedicine, facilitating accurate multilingual conversations and scalable healthcare delivery across diverse geographies.
Full Text:
PDFDOI: https://doi.org/10.31449/inf.v49i2.7826

This work is licensed under a Creative Commons Attribution 3.0 License.