A ViT-DQN-Based Real-Time Martial Arts Training System with Multimodal Fusion for Action Recognition and Optimization
Abstract
This paper presents an intelligent martial arts training system that integrates computer vision and reinforcement learning to address the inefficiencies, lack of personalization, and delayed feedback in traditional martial arts instruction. The system employs a Vision Transformer (ViT) for real-time action recognition and a Deep Q-Network (DQN) for training strategy optimization, enabling precise, adaptive feedback for athletes. By combining deep learning with IoT sensor data, the system analyzes posture, movement accuracy, and exercise intensity in real-time to maximize training effectiveness. A large-scale experiment involving 200 martial arts practitioners across multiple age groups demonstrated that the system achieved high recognition accuracy for key movements—96.8% for chopping, 98.1% for kicking, and 96.8% for grappling—significantly outperforming traditional CNN- and LSTM-based models. In terms of fluency optimization, the DQN model surpassed PPO and A3C with near-perfect fluency scores for chopping and sidekick. Moreover, athletes using the system achieved notable improvements in competitive outcomes: the under-18 group’s win rate rose from 65% to 85%, while the 23–27 age group improved from 75% to 90%. These findings validate the system’s effectiveness in enhancing training efficiency and technical precision and demonstrate the potential of artificial intelligence for intelligent martial arts instruction and broader sports training applications.References
Zhang Hao . Application value and practice of heart rate monitoring in school sports training and competition[J]. Contemporary Sports Science and Technology, 2022, 12(14): 19-22.
Sang Mengli . Analysis of causes of sports injuries and recovery methods in college sports training[J]. Contemporary Sports Science and Technology, 2021, 11(15): 18-20.
Chen Ronghao , Guo Hao. Analysis on the development and inheritance of martial arts in Chongqing from the perspective of national fitness [J]. Journal of Southwest Normal University (Natural Science Edition), 2020, 45(2): 123-127.
Zheng Tongjun , Chen Dan, Chen Lanlan. Research on the influence of martial arts on the cardiopulmonary function of college students[J]. Contemporary Sports Science and Technology, 2021, 11(4): 26-29.
Al-Faris M, Chiverton J, Ndzi D, et al. A review on computer vision-based methods for human action recognition[J]. Journal of imaging, 2020, 6(6): 46.
Kong Y, Fu Y. Human action recognition and prediction: A survey[J]. International Journal of Computer Vision, 2022, 130(5): 1366-1401.
Zhang K, Li Y, Wang J, et al. Real-time video emotion recognition based on reinforcement learning and domain knowledge[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 32(3): 1034-1047.
Weng J, Jiang X, Zheng WL, et al. Early action recognition with category exclusion using policy-based reinforcement learning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(12): 4626-4638.
Meng Z, Zhang M, Guo C, et al. Recent progress in sensing and computing techniques for human activity recognition and motion analysis[J]. Electronics, 2020, 9(9): 1357.
Babangida L, Perumal T, Mustapha N, et al. Internet of things (IoT) based activity recognition strategies in smart homes: a review[J]. IEEE sensors journal, 2022, 22(9): 8327-8336.
Bird JJ, Ek a rt A, Faria D R. British sign language recognition via late fusion of computer vision and leap motion with transfer learning to american sign language[J]. Sensors, 2020, 20(18): 5151.
Cao J, Tanjo Y. High-Accuracy Human Motion Recognition Independent of Motion Direction Using A Single Camera[J]. International Journal of Innovative Computing, Information and Control, 2024, 20(4): 1093-1103.
Shi Yuexiang , Zhu Maoqing . Collaborative convolutional Transformer network for skeleton action recognition[J]. Journal of Electronics & Information Technology, 2023, 45(4): 1485-1493.
Luo Huilan , Chen Han. Spatiotemporal convolutional attention network for action recognition[J]. Journal of Computer Engineering & Applications, 2023, 59(9).
Lv Shuping, Huang Yi, Wang Yingying . Research on human action recognition based on two-stream convolutional neural network[J]. Experimental Technology & Management, 2021, 38(8).
Zhang Lili , Liu Bo, Qu Lele, et al. Human motion recognition based on FMCW radar based on feature fusion convolutional neural network[J]. Telecommunication Engineering, 2022, 62(2).
Zhang H, Yang K, Cao G, et al. ViT-LLMR : Vision Transformer-based lower limb motion recognition from fusion signals of MMG and IMU[J]. Biomedical Signal Processing and Control, 2023, 82: 104508.
Wensel J, Ullah H, Munir A. Vit-ret: Vision and recurrent transformer neural networks for human activity recognition in videos[J]. IEEE Access, 2023.
Zhang Xiaolong, Wang Qingwei, Li Shangbin . Multimodal scene human dangerous behavior recognition method based on reinforcement learning[J]. Journal of Applied Sciences, 2021, 39(4): 605-614.
ZHANG Wei, TAN Wenhao, LI Yibin. Current status and prospects of quadruped robot motion control based on deep reinforcement learning[J]. Journal of Shandong University (Medical Science), 2020, 58(8): 61-66.
Chao ZH, Ya Long Y, Yi L, et al. Deep Q Learning-Enabled Training and Health Monitoring of Basketball Players Using IoT Integrated Multidisciplinary Techniques[J]. Mobile Networks and Applications, 2024: 1-16.
Omstedt F. A deep reinforcement learning approach to the problem of golf using an agent limited by human data[J]. 2020.
Casgrain P, Ning B, Jaimungal S. Deep Q-learning for Nash equilibria: Nash-DQN[J]. Applied Mathematical Finance, 2022, 29(1): 62-78.
Yin Y, Zhang X, Zhan S, et al. DQN regenerative braking control strategy based on adaptive weight coefficients[J]. Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering, 2024, 238(10-11): 2956-2966.
DOI:
https://doi.org/10.31449/inf.v49i28.8606Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







