Optimizing Long-Term User Engagement in Short-Video Recommendation via Reinforcement Learning: A Markov Decision Process Framework with Composite Rewards

Juan Di

doi:10.31449/inf.v50i13.13064

Optimizing Long-Term User Engagement in Short-Video Recommendation via Reinforcement Learning: A Markov Decision Process Framework with Composite Rewards

Abstract

Under the dynamic condition of short video platforms, the shortfall of conventional recommendation algorithms that pay too much attention to short-term indicators at the cost of long-term user behavior is increasingly obvious. To compensate for it, we utilized a Deep Reinforcement Learning (DRL) approach to develop an intelligent recommendation system framework supported by deep feature engineering, policy updating, and online interaction. We effectively cast the difficult recommendation process into a Markov Decision Process (MDP) in order to improve the user experience by maximizing long-term user value. Experimental findings illustrate that, relative to baseline models like collaborative filtering (MF) and deep neural networks (DNN), our DRL agent possesses a remarkable lead over key long-term engagement indicators, specifically gaining an improvement of more than 22% in average session time. Besides, an ablation study of the reward function confirmed that both immediate and delayed signals are necessary for a composite reward architecture in order to learn a good policy. The findings of this work have repercussions for how short video recommendation intelligence can be boosted and even indicate a new research path for the recommender systems community, shifting away from using short-term metrics towards maximizing long-term user value.

Authors

Juan Di Jinzhong University

DOI:

https://doi.org/10.31449/inf.v50i13.13064

Downloads

Published

05/18/2026

How to Cite

Di, J. (2026). Optimizing Long-Term User Engagement in Short-Video Recommendation via Reinforcement Learning: A Markov Decision Process Framework with Composite Rewards. Informatica, 50(13). https://doi.org/10.31449/inf.v50i13.13064

Download Citation

Issue

Vol. 50 No. 13 (2026): Online-only issue

Section

Online-only

License

Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.

All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.

Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.

Optimizing Long-Term User Engagement in Short-Video Recommendation via Reinforcement Learning: A Markov Decision Process Framework with Composite Rewards

Abstract

Authors

DOI:

Downloads

Published

How to Cite

Issue

Section

License

Developed By

Information