Hierarchical Multi-Agent Deep Reinforcement Learning for Coordinated Optimization of Aggregated Virtual Power Plants in Smart Microgrids

Abstract

Computational efficiency, Data privacy, and equitable benefit assessment are some of the issues that have arisen as a result of the fast expansion of distributed energy resources (DERs), which have added complexity to the functioning of distribution networks. This paper presents a two-tiered VPP coordination architecture that takes into account the operational interests of both Distribution System Operators (DSOs) or VPPs under AC optimum power flow (AC-OPF) limitations. The goal is to solve these challenges. A penalty-function-enhanced OPF mechanism is used in the upper layer to guarantee network security in the event of voltage or branch-limit violations, and an Asynchronous Advantage Actor-Critic (A3C) multi-agent architecture is integrated into the lower layer to utilize a parameter-sharing Twin-Delayed Deep Deterministic Policy Gradient (PS-TD3) algorithm. Through lightweight parameter sharing and decentralized execution, every agent— which represents a VPP subsystem—learns optimum judgments for energy-dispatch, storage, and flexibility, resulting in dramatically reduced computational cost and preservation of data privacy. When compared to the non-cooperative TD3 baseline, traditional distributed OPF, and independent Q-learning, the suggested dual-layer MARL approach outperforms all three in simulation tests conducted on the IEEE 33-node distribution network. Thanks to parameter sharing, the PS-TD3 + A3C hybrid improves convergence speed through 42% and reduces per-step computing time by 37%. It also reduces voltage variation by 31.4%, network real-power losses by 26.7%, and operating cost by 18.2%. Since agents only share compressed gradients and not raw operational data, privacy leakage is minimized by more than 80%. In contemporary distribution systems that are rich in distributed energy resources (DERs), the findings show that the suggested framework provides a computationally efficient, scalable, and privacy-preserving approach for coordinated VPP operation.

Authors

  • Junxiong Zhang Anhui Technical College of Industry and Economy, China
  • Jiao Wang Anhui Jianzhu University, China
  • Qihang Wang Anhui Technical College of Industry and Economy, China
  • Qi Zhou Anhui Technical College of Industry and Economy, China

DOI:

https://doi.org/10.31449/inf.v50i11.12252

Downloads

Published

04/23/2026

How to Cite

Zhang, J., Wang, J., Wang, Q., & Zhou, Q. (2026). Hierarchical Multi-Agent Deep Reinforcement Learning for Coordinated Optimization of Aggregated Virtual Power Plants in Smart Microgrids. Informatica, 50(11). https://doi.org/10.31449/inf.v50i11.12252