SU, Tong; HU, Cuihua. ESVA: Enhancing Multimodal Emotion Recognition via Multi-Scale Audio Feature Extraction and Cross-Modal Temporal Alignment. Informatica, [S. l.], v. 49, n. 31, 2025. DOI: 10.31449/inf.v46i31.12043. Disponível em: https://www.informatica.si/index.php/informatica/article/view/12043. Acesso em: 24 jan. 2026.