Enhanced YOLOv11 for Robust Real-Time Skiing Action Recognition via Multimodal and Spatiotemporal Learning

Abstract

This paper proposes an enhanced YOLOv11 model for real-time skiing action recognition, incorporating five key architectural improvements: spatiotemporal modeling, adaptive channel attention (ACA), hybrid convolution blocks, dynamic-aware pooling, and multi-scale feature fusion. The model is evaluated on the proprietary SnowAction dataset, which includes over 100,000 annotated video segments under diverse weather and terrain conditions. Comparative experiments demonstrate that YOLOv11 achieves 94.5% accuracy on sliding actions, 7.2% higher than YOLOv4, and attains 55.2 FPS at 640×480 resolution. In cross-model benchmarks, YOLOv11 surpasses CNN-LSTM, 3D CNN, and Transformer models in precision, recall, and inference speed, showing strong real-time capability and robustness in adverse weather. These results establish YOLOv11 as a reliable solution for high-dynamic action recognition tasks in skiing scenarios.

Authors

  • Dong Liu School of Physical Education of Suihua University
  • Minghai Ju School of Physical Education of Suihua University

DOI:

https://doi.org/10.31449/inf.v49i3.9307

Downloads

Published

09/12/2025

How to Cite

Liu, D., & Ju, M. (2025). Enhanced YOLOv11 for Robust Real-Time Skiing Action Recognition via Multimodal and Spatiotemporal Learning. Informatica, 49(3). https://doi.org/10.31449/inf.v49i3.9307

Issue

Section

Regular papers