Optimizing Long-Term User Engagement in Short-Video Recommendation via Reinforcement Learning: A Markov Decision Process Framework with Composite Rewards
Abstract
Under the dynamic condition of short video platforms, the shortfall of conventional recommendation algorithms that pay too much attention to short-term indicators at the cost of long-term user behavior is increasingly obvious. To compensate for it, we utilized a Deep Reinforcement Learning (DRL) approach to develop an intelligent recommendation system framework supported by deep feature engineering, policy updating, and online interaction. We effectively cast the difficult recommendation process into a Markov Decision Process (MDP) in order to improve the user experience by maximizing long-term user value. Experimental findings illustrate that, relative to baseline models like collaborative filtering (MF) and deep neural networks (DNN), our DRL agent possesses a remarkable lead over key long-term engagement indicators, specifically gaining an improvement of more than 22% in average session time. Besides, an ablation study of the reward function confirmed that both immediate and delayed signals are necessary for a composite reward architecture in order to learn a good policy. The findings of this work have repercussions for how short video recommendation intelligence can be boosted and even indicate a new research path for the recommender systems community, shifting away from using short-term metrics towards maximizing long-term user value.DOI:
https://doi.org/10.31449/inf.v50i13.13064Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







