Improved Multi-Target Athlete Tracking in Sports Videos Using IYOLOv8-MTD and Enhanced DeepSORT with Hybrid Attention and IMM
Abstract
The data-driven development of competitive sports has raised higher demands for precise capture and analysis of athletes' movement details. To improve the accuracy and continuity of multi-target detection and tracking in sports scenes, this article constructs a multi-target detection model based on improved YOLOv8 (IYOLOv8-MTD) and a multi-target tracking model based on improved DeepSORT (IDeepSORT-MTT), and improves performance through multi-module collaborative optimization. The specific method innovations are as follows: In the detection module (IYOLOv8-MTD), the convolutional block attention module (CBAM) is optimized through the global context transformer (GCT) to enhance key feature responses, the large selection kernel (LSK) module is introduced to reconstruct the C2f module to adapt to multi-scale targets, and the Inner Intersection over union (IIOU) and multi-part detection over union (MPDIoU) optimize the loss function to improve the bounding box regression accuracy; in the tracking module (IDeepSORT-MTT), the interactive multi-model (IMM) Kalman filter is introduced to fuse the uniform/uniform acceleration model to adapt to the nonlinear state of the moving target, a hybrid attention mechanism (channel + spatial feature weighted fusion) is designed to enhance the discriminability of appearance features, and a heat map detector is used to assist positioning to reduce positioning deviation. The experiment is verified on the SportsMOT data set (including 240 videos with a total of about 150,000 frames, divided at 8:1:1 into 192 segments of 120,000 frames for the training set, 24 segments of the validation set for 15,000 frames, and 24 segments of the test set for 15,000 frames). The hardware platform is NVIDIA GeForce RTX 2080Ti GPU and Intel i5-10400F CPU, using the standard MOTEval Tool evaluation. The results show that the detection model IYOLOv8-MTD has a mAP50 of 97.14% and a mAP50-95 of 92.22%, which are significantly better than the traditional YOLOv8 (mAP50 92.78%, mAP50-95 81.67%); the tracking model IDeepSORT-MTT has an average multi-target tracking accuracy (MOTA) of 92.81%, and the identity The average F1 value (IDF1) is 77.56%, the number of identity switching is reduced by 66.3% compared with the original DeepSORT, and the processing speed is maintained at 7.1-8.0 frames/second (FPS). The overall performance of the model is superior to traditional methods and comparative studies, effectively improving the accuracy and continuity of multi-target detection and tracking in complex sports scenarios, and providing a reliable technical solution for athlete trajectory analysis, tactical review and physical fitness assessment.DOI:
https://doi.org/10.31449/inf.v49i35.10859Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







