Enhanced YOLOv11 for Robust Real-Time Skiing Action Recognition via Multimodal and Spatiotemporal Learning
Abstract
This paper proposes an enhanced YOLOv11 model for real-time skiing action recognition, incorporating five key architectural improvements: spatiotemporal modeling, adaptive channel attention (ACA), hybrid convolution blocks, dynamic-aware pooling, and multi-scale feature fusion. The model is evaluated on the proprietary SnowAction dataset, which includes over 100,000 annotated video segments under diverse weather and terrain conditions. Comparative experiments demonstrate that YOLOv11 achieves 94.5% accuracy on sliding actions, 7.2% higher than YOLOv4, and attains 55.2 FPS at 640×480 resolution. In cross-model benchmarks, YOLOv11 surpasses CNN-LSTM, 3D CNN, and Transformer models in precision, recall, and inference speed, showing strong real-time capability and robustness in adverse weather. These results establish YOLOv11 as a reliable solution for high-dynamic action recognition tasks in skiing scenarios.
Full Text:
PDFDOI: https://doi.org/10.31449/inf.v49i3.9307
This work is licensed under a Creative Commons Attribution 3.0 License.








