Optimizing Long-Term User Engagement in Short-Video Recommendation via Reinforcement Learning: A Markov Decision Process Framework with Composite Rewards

Abstract

Under the dynamic condition of short video platforms, the shortfall of conventional recommendation algorithms that pay too much attention to short-term indicators at the cost of long-term user behavior is increasingly obvious. To compensate for it, we utilized a Deep Reinforcement Learning (DRL) approach to develop an intelligent recommendation system framework supported by deep feature engineering, policy updating, and online interaction. We effectively cast the difficult recommendation process into a Markov Decision Process (MDP) in order to improve the user experience by maximizing long-term user value. Experimental findings illustrate that, relative to baseline models like collaborative filtering (MF) and deep neural networks (DNN), our DRL agent possesses a remarkable lead over key long-term engagement indicators, specifically gaining an improvement of more than 22% in average session time. Besides, an ablation study of the reward function confirmed that both immediate and delayed signals are necessary for a composite reward architecture in order to learn a good policy. The findings of this work have repercussions for how short video recommendation intelligence can be boosted and even indicate a new research path for the recommender systems community, shifting away from using short-term metrics towards maximizing long-term user value.

Authors

  • Juan Di Jinzhong University

DOI:

https://doi.org/10.31449/inf.v50i13.13064

Downloads

Published

05/18/2026

How to Cite

Di, J. (2026). Optimizing Long-Term User Engagement in Short-Video Recommendation via Reinforcement Learning: A Markov Decision Process Framework with Composite Rewards. Informatica, 50(13). https://doi.org/10.31449/inf.v50i13.13064