GEAR: A Counterfactual Multi-Agent Reinforcement Learning Framework for Strategic Resource Allocation in Game Recommendation Systems
Abstract
As the world's game market is still experiencing its explosive expansion, personalized recommender systems have become a necessity in order to enhance player experience and platform stickiness. However, conventional recommendation models, which often try to maximize short-term interaction measures like click-through rates, inevitably lead to content homogenization and degrade the diversity of the content ecosystem. This myopic focus ultimately undermines long-term player retention. To address this root difficulty, this thesis suggests a new paradigm that recasts the recommendation issue as from what to recommend to how to recommend strategically. This work innovatively models a game recommendation platform, which is composed of multiple scenarios, as a cooperative multi-agent system where every scenario is an agent. They have the common objective of optimizing the long-term ecosystem health of the platform. To this end, we design and implement a novel multi-agent reinforcement learning algorithm, GEAR (Guild-based Ecosystem-aware Allocation of Resources). We model this problem as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) and solve it using GEAR, which is based on the Centralized Training with Decentralized Execution (CTDE) framework and features a novel counterfactual credit assignment mechanism. GEAR is run under the Centralized Training with Decentralized Execution (CTDE) framework. Its main innovation is a counterfactual-based credit assignment mechanism, enabling each agent to accurately assess its marginal contribution to a global, long-term utility function—a composite measure of player retention, content diversity, and user engagement. This mechanism effectively resolves the non-stationarity and credit assignment problems inherent in multi-agent learning. We conduct thorough experiments in a purpose-built simulated game recommendation platform. The results demonstrate that GEAR significantly outperforms static policies, independent learners, and state-of-the-art multi-agent baselines, including MADDPG and QMIX, on all key long-term metrics. Ablation studies also validate the critical contribution of the counterfactual mechanism to the algorithm's stability and performance. Furthermore, GEAR exhibits commendable strategic flexibility, intelligently altering its resource allocation policy to fit in with dynamic shifts in platform objectives. This research lays out both a novel theoretical framework and an effective technical methodology for the development of the next generation of self-managing, ecosystem-aware smart recommender systems.DOI:
https://doi.org/10.31449/inf.v49i31.10220Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







