Enhanced YOLOv11 for Robust Real-Time Skiing Action Recognition via Multimodal and Spatiotemporal Learning
Abstract
This paper proposes an enhanced YOLOv11 model for real-time skiing action recognition, incorporating five key architectural improvements: spatiotemporal modeling, adaptive channel attention (ACA), hybrid convolution blocks, dynamic-aware pooling, and multi-scale feature fusion. The model is evaluated on the proprietary SnowAction dataset, which includes over 100,000 annotated video segments under diverse weather and terrain conditions. Comparative experiments demonstrate that YOLOv11 achieves 94.5% accuracy on sliding actions, 7.2% higher than YOLOv4, and attains 55.2 FPS at 640×480 resolution. In cross-model benchmarks, YOLOv11 surpasses CNN-LSTM, 3D CNN, and Transformer models in precision, recall, and inference speed, showing strong real-time capability and robustness in adverse weather. These results establish YOLOv11 as a reliable solution for high-dynamic action recognition tasks in skiing scenarios.DOI:
https://doi.org/10.31449/inf.v49i3.9307Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







