Research on Automatic Sharding and Load Balancing of NoSQL Databases Based on Twin Delayed Deep Deterministic Policy Gradient (TD3)
Abstract
With the rapid development of cloud computing and big data technology, NoSQL databases face severe challenges of sharding and load balancing in dynamic load scenarios with massive amounts of data. Traditional policies rely on static rules or threshold mechanisms, making it difficult to adapt to sudden traffic fluctuations and data distribution skews, resulting in frequent hotspot fragmentation, increased cross-node query latency, and uneven resource utilization. In this study, a dynamic optimization framework based on deep reinforcement learning is proposed, which collects multi-dimensional indicators such as cluster node load, network latency, and query mode in real time (covering 9 core parameters such as sharded data volume, node CPU/memory utilization, disk I/O, query latency, etc.), constructs the state space through Min-Max normalization and weighted fusion to characterize the global characteristics of the system, and designs a composite reward function including throughput reward, response time reward, and migration cost penalty. Achieve a multi-objective optimization balance. In the Cassandra cluster experiment, an open-source distributed database, the YCSB benchmark is used to simulate a mixed-load and burst traffic scenario, and compared with the traditional consistent hash and weighted polling strategy, the method reduces the incidence of hotspot sharding by 42% (from 23.7% to 13.8%), the average query latency is reduced by 35% (optimized from 152ms to 99ms), and the data migration amount is reduced by 28% compared with the threshold triggering mechanism in a 10-node cluster. By introducing the Twin delayed deep deterministic policy gradient (TD3) algorithm, the agent can effectively avoid local optimum when dynamically adjusting the shard boundary and request routing, and the standard deviation of system throughput is reduced by 61% (112ops vs 289ops) compared with the traditional method in the 24-hour traffic fluctuation test. After training on 500,000 steps, the algorithm converges 2.3 times faster than traditional DQN, and the long-term return is increased by 19%. Experimental results show that the reinforcement learning-driven strategy significantly improves the resource utilization and service quality of the cluster, and provides a new technical path for the autonomous management of databases in complex dynamic environments.DOI:
https://doi.org/10.31449/inf.v49i28.10320Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







