A Unified Taxonomy and Empirical Review of Recent Proximal Policy Optimization Variants and Their Real-World Applications
Abstract
Artificial Intelligence (AI) algorithms such as Proximal Policy Optimization (PPO) help train agents for sequential decision-making tasks. Existing surveys already provide good coverage of the original PPO and its early developments but fell short of capturing the rapid evolution of PPO-based methods over the recent few years. Since 2023, a wave of algorithmic variants has emerged. These variants address different objectives, regularization approaches, exploration methods, training pipelines, and hybrid architectures across diverse applications. However, there has been no systematic effort to organize, compare, or critically assess these advances. This review addresses that research gap. This review analyzes 32 peer- reviewed studies (2023–2025) to evaluate 15+ PPO variants across six innovation categories and ten application domains. The study revealed that PPO delivers a typical performance improvement of 15– 44% over baselines across metrics, including improved safety constraint satisfaction (+15%), computational efficiency (+18% SLA compliance), and sim-to-real transfer (+23% task success). The study analyzed advancements and developments by proposing a unified taxonomy focused on algorithmic advances and their performance in real-world scenarios. Three critical dimensions considered for evaluation are: generalization across tasks and environments, robustness and safety in deployment, and computational efficiency in training and inference. The review also identifies recurring limitations, inconsistent evaluation practices, and underexplored directions. It exposes gaps between simulation benchmarks and real-world deployment conditions, including operational constraints and challenges. By connecting theoretical improvements to empirical outcomes, this work serves as both a practical reference for engineers and researchers applying PPO today. The synthesized taxonomy provides a structured reference for analyzing recent PPO variants and their empirical trade-offs.References
I. Zerine, S. Islam, Y. Ahmad, M. Islam, and Y. A. Biswas, “AI-Driven Supply Chain Resilience: Integrating Reinforcement Learning and Predictive Analytics for Proactive Disruption Management,” Bus. Soc. Sci., vol. 1, no. 1, pp. 1–12, Sep. 2025.
M. Zheng, J. Zhang, C. Zhan, X. Ren, and S. Lü, “Proximal policy optimization with reward-based prioritization,” Expert Syst. Appl., vol. 283, p. 127659, Jul. 2025, doi: 10.1016/j.eswa.2025.127659.
Q. Wang, L. Chen, Q. Sun, C. Wang, and Y. Wei, “A controller of robot constant force grinding based on proximal policy optimization algorithm,” PLOS One, vol. 20, no. 5, p. e0319440, May 2025, doi: 10.1371/journal.pone.0319440.
M. Bilban and O. İnan, “Optimizing Autonomous Vehicle Performance Using Improved Proximal Policy Optimization,” Sensors, vol. 25, no. 6, p. 1941, Mar. 2025, doi: 10.3390/s25061941.
K. Sun, J. Yang, J. Li, B. Yang, and S. Ding, “Proximal Policy Optimization-Based Hierarchical Decision-Making Mechanism for Resource Allocation Optimization in UAV Networks,” Electronics, vol. 14, no. 4, p. 747, Feb. 2025, doi: 10.3390/electronics14040747.
R. Z. Al-Shaikh, M. M. J. Al-Nayar, and A. M. Hasan, “Reinforcement Learning Algorithms for Adaptive Load Balancing in Publish/Subscribe Systems: PPO, UCB, and Epsilon-Greedy Approaches,” Informatica, vol. 49, no. 7, Feb. 2025, doi: 10.31449/inf.v49i7.6895.
E. Petriglia, F. Filippini, M. Ciavotta, and M. Savi, “Multi-Agent Reinforcement Learning for Workload Distribution in FaaS-Edge Computing Systems,” in 2025 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Milano, Italy: IEEE, Jun. 2025, pp. 1128–1131. doi: 10.1109/IPDPSW66978.2025.00176.
S. Kothapalli, “Real-Time Resource Allocation Optimization for Dynamic Construction Job Sites Using Deep Reinforcement Learning: A Case Study Implementation,” Int. J. Artif. Intell. Data Sci. Mach. Learn., vol. 6, no. 3, pp. 13–23, 2025.
O. Vyshnevskyy, L. Zhuravchak, and V. Yakovyna, “Improving energy efficiency in smart building using deep reinforcement learning control strategy⋆,” in ICyberPhyS 5, Khmelnytskyi, Ukraine, Jul. 2025, p. 14. [Online]. Available: https://ceur-ws.org/Vol-4013/paper1.pdf
B. I. Besheli, “An AI-based Model for Managing a Smart Seafood Supply Chain,” Doctor of Philosophy in Information Systems, University of Canterbury, 2025. [Online]. Available: https://ir.canterbury.ac.nz/server/api/core/bitstreams/91bf292c-bd9f-480f-869d-5e754472fa1a/content
Q. Yan, X. Wu, J. Wang, G. Fortino, F. Pupo, and M. Yin, “EGCARL: A PPO-based reinforcement learning method with expert guidance and dynamic rewards for autonomous driving,” Inf. Fusion, vol. 126, p. 103606, Feb. 2026, doi: 10.1016/j.inffus.2025.103606.
L. Yang, X. Zhang, and J. Li, “Energy-Aware Dependent Task Offloading and Resource Allocation for Industrial IoT with Computing and Network Convergence,” IEEE Internet Things J., pp. 1–1, 2025, doi: 10.1109/JIOT.2025.3600236.
P. Lu, Y. Wu, J. Li, N. Zhang, K. Li, and M. Shahidehpour, “Distributed Proximal Policy Optimization with Embedded Dual Rules for Power Systems Considering Wind and Photovoltaic Forecasting,” IEEE Trans. Sustain. Energy, pp. 1–15, 2025, doi: 10.1109/TSTE.2025.3584592.
X. Zhuang, J. Wu, H. Wu, T. Zhang, and L. Gao, “Joint Optimization of Model Inferencing and Task Offloading for MEC-Empowered Large Vision Model Services,” presented at the IEEE INFOCOM 2025 - IEEE Conference on Computer Communications, 2025. doi: 10.1109/INFOCOM55648.2025.11044689.
L. Feng, “Joint optimization algorithm for vehicle scheduling and supply chain inventory management based on multi-agent deep reinforcement learning,” Neural Comput. Appl., Sep. 2025, doi: 10.1007/s00521-025-11661-0.
Y. Li, N. Selva, and R. Zhu, “Energy-Aware Multi-Agent K-hop Proximal Policy Optimization for Mission-Oriented Drone Networks,” in 2025 34th International Conference on Computer Communications and Networks (ICCCN), Tokyo, Japan: IEEE, Aug. 2025, pp. 1–6. doi: 10.1109/ICCCN65249.2025.11133940.
N. Pang, L. Huang, and W. Zhang, “Coupled Penalties-Augmented Proximal Policy Optimization for Safe Reinforcement Learning,” J. Phys. Conf. Ser., vol. 3077, no. 1, p. 012002, Aug. 2025, doi: 10.1088/1742-6596/3077/1/012002.
J. Wei et al., “A proximal policy optimization algorithm based on adaptive hierarchical particle swarm,” in 2025 IEEE 14th Data Driven Control and Learning Systems (DDCLS), Wuxi, China: IEEE, May 2025, pp. 2295–2301. doi: 10.1109/ddcls66240.2025.11065850.
J. Wang, W. Bai, K. Muttaqi, and D. Sutanto, “Improving aeration efficiency in wastewater treatment systems through collaborative reinforcement learning: A multi-objective approach to overshoot and settling time reduction,” J. Water Process Eng., vol. 77, p. 108420, Sep. 2025, doi: 10.1016/j.jwpe.2025.108420.
X. Wu, Q. Yan, J. Wang, Y. Zhou, Q. Huang, and C. Jiang, “Dynamic Task Allocation for UAV Swarms in Maritime Rescue Scenarios Based on PG-MAPPO,” IEEE Internet Things J., pp. 1–1, 2025, doi: 10.1109/JIOT.2025.3584767.
J. Wang et al., “Imitation learning from observation for ROV path tracking,” Intell. Mar. Technol. Syst., vol. 3, no. 1, p. 20, Jul. 2025, doi: 10.1007/s44295-025-00069-0.
S. Tengse, “Enhanced Financial Portfolio Optimization with Risk Management using the GraphSAGE-PPO model,” Int. J. Eng. Inf. Manag., vol. 1, no. 3, pp. 1–20, Jul. 2025.
B. Madaminova, S. Saidmurodovb, E. Saitovc, D. Jumanazarovd, A. M. Alsayahf, and L. Zhetkenbay, “Multi-objective Optimization Framework for Energy Efficiency and Production Scheduling in Smart Manufacturing Using Reinforcement Learning and Digital Twin Technology Integration,” Int. J. Ind. Eng. Manag., p. 13, 2025, doi: 10.24867/IJIEM-389.
M. Dehghan-Bonari, J. Wright, B. K. Da Silva, F. Rezaei, and M. Marufuzzaman, “Agent-Based Inventory Management using Deep Reinforcement Learning: An Application in Timber Supply Chain,” in Proceedings of the IISE Annual Conference & Expo 2025, p. 7. [Online]. Available: https://www.researchgate.net/profile/Mohamad-Dehghan-Bonari/publication/393795342
R. Garine and R. K. Chakrabortty, “A deep learning and policy optimization approach for supply chain order classification,” Supply Chain Anal., p. 100166, Sep. 2025, doi: 10.1016/j.sca.2025.100166.
Y. Li, J. Chen, and L. Feng, “Dealing with Uncertainty: A Survey of Theories and Practices,” IEEE Trans. Knowl. Data Eng., vol. 25, no. 11, pp. 2463–2482, Nov. 2013, doi: 10.1109/TKDE.2012.179.
S. S. Chadha and U. Venkatadri, “Reinforcement Learning-based Inventory Replenishment and Transshipment Planning in the Physical Internet Supply Chain,” in 17th IMHRC Proceedings, Norway, Jan. 2025. [Online]. Available: https://digitalcommons.georgiasouthern.edu/pmhr_2025/10/
P. Doshi and S. Shrivastava, “Optimization of Marketing Campaigns with Reinforcement Learning,” in 2025 Global Conference in Emerging Technology (GINOTECH), Pune, India, May 2025. doi: 10.1109/GINOTECH63460.2025.11076680.
N. Cole, “A Hierarchical Deep Reinforcement Learning Strategy for Self-Adaptive CPU Scheduling in Multi-tenant Database Systems,” Comput. Life, vol. 13, no. 2, p. 6, 2025.
Q. Gao, C. Liu, L. Wang, Y. Liu, and Y. Xu, “Blockchain-based heterogeneous resource configuration scheme in computing power network,” Sci. Rep., vol. 15, no. 1, p. 21247, Jul. 2025, doi: 10.1038/s41598-025-05560-6.
Y. Li, Y. Jia, and Z. Pan, “ALI-MAPPO: Attention on Local Information Aided MAPPO Algorithm for Power Allocation of Wireless Cognitive Jamming Systems,” IEEE Trans. Aerosp. Electron. Syst., pp. 1–17, 2025, doi: 10.1109/TAES.2025.3580014.
V. Prakash, S. Katiyar, R. K. Rai, and B. C. Chatterjee, “PO-RMSA: Proximal Policy Optimization-Based Routing, Modulation, and Spectrum Allocation in Elastic Optical Networks,” in 2025 25th Anniversary International Conference on Transparent Optical Networks (ICTON), Barcelona, Spain: IEEE, Jul. 2025, pp. 1–4. doi: 10.1109/ICTON67126.2025.11125409.
S. P. Chandler and I. Ullah, “Network Routing Optimization Using an AWAC-Enhanced PPO and GAT Architecture,” in 2025 Seventh International Symposium on Computer, Consumer and Control (IS3C), Taichung, Taiwan: IEEE, Jun. 2025, pp. 1–4. doi: 10.1109/IS3C65361.2025.11131092.
Y. Wang and Z. Han, “Ant colony optimization for traveling salesman problem based on parameters optimization,” Appl. Soft Comput., vol. 107, p. 107439, Aug. 2021, doi: 10.1016/j.asoc.2021.107439.
H. L. Yimer, P. Yang, and L. Qingge, “An improved actor-critic architecture with PPO for the traveling salesman problem,” Expert Syst. Appl., vol. 298, p. 129723, Mar. 2026, doi: 10.1016/j.eswa.2025.129723.
D. T. Tran, K. Q. Tran, K. A. Pham, V. K. Vu, and D. D. Do, “NeuFACO: Neural Focused Ant Colony Optimization for Traveling Salesman Problem,” 2025, arXiv. doi: 10.48550/ARXIV.2509.16938.
C.-C. Yo and C.-K. Hsu, “A 5G-enabled IoT-based Fatigue Driving Detection System using Proximal Policy Optimization,” in 2025 International Wireless Communications and Mobile Computing (IWCMC), Abu Dhabi, United Arab Emirates: IEEE, May 2025, pp. 514–519. doi: 10.1109/iwcmc65282.2025.11059491.
J. Ma, Y. Li, Z. Zhang, and G. Song, “A deep reinforcement learning approach for speed fluctuation control in multiple time-varying systems,” Expert Syst. Appl., vol. 294, p. 128832, Dec. 2025, doi: 10.1016/j.eswa.2025.128832.
V. S. Sundari, K. R. Reddy, M. MuhssanAlmusawi, P. K. Pareek, and N. Naga Saranya, “Convolutional Neural Network and Proximal Policy Optimization based Uncertainty Aware Collision Avoidance and Decision-Making System,” in 2025 3rd International Conference on Data Science and Information System (ICDSIS), Hassan, India: IEEE, May 2025, pp. 1–5. doi: 10.1109/icdsis65355.2025.11070742.
V. Joshi Kumar and V. K. Elumalai, “A proximal policy optimization based deep reinforcement learning framework for tracking control of a flexible robotic manipulator,” Results Eng., vol. 25, p. 104178, Mar. 2025, doi: 10.1016/j.rineng.2025.104178.
J. Lee, Y. Park, J. Eom, H. Hwang, and S. Kim, “Ship Voyage Route Waypoint Optimization Method Using Reinforcement Learning Considering Topographical Factors and Fuel Consumption,” J. Mar. Sci. Eng., vol. 13, no. 8, p. 1554, Aug. 2025, doi: 10.3390/jmse13081554.
M. A. Alsuwaiket, “Optimizing Autonomous Vehicle Navigation Through Reinforcement Learning in Dynamic Urban Environments,” World Electr. Veh. J., vol. 16, no. 8, p. 472, Aug. 2025, doi: 10.3390/wevj16080472.
J. Lin, Z. Zhang, R. Shi, and S. Wang, “Personnel Detection via Reinforcement Learning-Based Dynamic Parameter Optimization with Vehicle-Mounted Ultra-Wideband,” in Lecture Notes in Computer Science (LNCS,volume 15858), Springer, Singapore, Jul. 2025. doi: https://doi.org/10.1007/978-981-96-9805-9_43.
T. Corbin, “Vision-Based Autonomous Navigation and Obstacle Avoidance in Mobile Robots Using Deep Reinforcement Learning,” Trans. Comput. Sci. Methods, vol. 5, no. 7, p. 10, 2025.
P. Wang, C. Gong, X. Jin, L. Wang, P. Shen, and B. Wang, “LPad: Automatic driving decision generation framework based on large language model and near-end optimization strategy,” in 2025 Joint International Conference on Automation-Intelligence-Safety (ICAIS) & International Symposium on Autonomous Systems (ISAS), Xi’an, China: IEEE, May 2025, pp. 1–6. doi: 10.1109/ICAISISAS64483.2025.11052199.
R. Selvanarayanan, S. Rajendran, M. Zakariah, and A. Alnuaim, “Purifying Kopi Luwak beans with precise RL-based proximal policy optimization using visual transformer with FRD,” Egypt. Inform. J., vol. 31, p. 100737, Sep. 2025, doi: 10.1016/j.eij.2025.100737.
Z. Li et al., “Enhancing position-based visual servoing performance through transformer-based acceleration-level reinforcement learning,” Complex Intell. Syst., vol. 11, no. 10, p. 430, Oct. 2025, doi: 10.1007/s40747-025-02056-8.
A. Lotfolahi and H.-W. Ferng, “DRL-Based Resource Allocation in NOMA-Aided Industrial IoT Towards Energy Productivity Maximization,” IEEE Trans. Netw. Sci. Eng., pp. 1–16, 2025, doi: 10.1109/TNSE.2025.3584786.
M. A. Hechmi, S. Ben Rejeb, N. Nasser, and S. Tabbane, “Advanced Load Management for 6G Networks Using Multi-Agent Reinforcement Learning,” in 2025 International Wireless Communications and Mobile Computing (IWCMC), Abu Dhabi, United Arab Emirates: IEEE, May 2025, pp. 890–895. doi: 10.1109/iwcmc65282.2025.11059471.
M. W. A. Ashraf, A. R. Singh, R. S. Rathore, W. Jiang, A. Janagaraj, and B. Selvaraj, “Enhancing Indoor IoT Edge Intelligence With Deep Reinforcement Learning in Hybrid WiFi/LiFi Networks,” IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 18, pp. 23344–23355, 2025, doi: 10.1109/JSTARS.2025.3603873.
K. Shankar, W. Louw, and K. Cohen, “On-Policy Optimization of ANFIS Policies Using Proximal Policy Optimization,” Jun. 22, 2025, arXiv: arXiv:2507.01039. doi: 10.48550/arXiv.2507.01039.
L. W. Shao, L. P. Qian, M. Q. Li, W. Jiang, and W. Jia, “SC-DRL: A Status Correction-empowered Deep Reinforcement Learning Algorithm for Dependency-aware Application Offloading,” IEEE Trans. Serv. Comput., pp. 1–14, 2025, doi: 10.1109/TSC.2025.3611673.
X. Zhao et al., “Adaptive resource management in dynamic Cyber–Physical Systems using Artificial Intelligence,” Eng. Appl. Artif. Intell., vol. 162, p. 112409, Dec. 2025, doi: 10.1016/j.engappai.2025.112409.
Y. Wang, X. Liu, and X. Yu, “Research on Joint Game-Theoretic Modeling of Network Attack and Defense Under Incomplete Information,” Entropy, vol. 27, no. 9, p. 892, Aug. 2025, doi: 10.3390/e27090892.
K. Dutta, P. Gupta, and D. Bajaj, “Robo-Net: A Novel Reinforced Walking Biped Design Using an Augmented Random Search Approach,” presented at the International Conference on Augmented Reality, Intelligent Systems, and Industrial Automation (ARIIA), 2024. doi: 10.1109/ARIIA63345.2024.11051819.
S.-H. Choi, S.-M. Choi, and S.-J. Buu, “Proximal Policy-Guided Hyperparameter Optimization for Mitigating Model Decay in Cryptocurrency Scam Detection,” Electronics, vol. 14, no. 6, p. 1192, Mar. 2025, doi: 10.3390/electronics14061192.
Y. Li et al., “TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling,” Aug. 24, 2025, arXiv: arXiv:2508.17445. doi: 10.48550/arXiv.2508.17445.
T. Wang, Q. Zhang, J. Zhao, H. Leung, and W. Wang, “An Event-Driven Neural Kalman Model for State Representation and Learning-Based Dynamic Scheduling of Industrial Energy System,” IEEE Trans. Ind. Inform., pp. 1–12, 2025, doi: 10.1109/TII.2025.3593846.
M. Gams, “The Oath of Researchers and Developers,” Informatica, vol. 49, no. 1, Jan. 2025, doi: 10.31449/inf.v49i1.8149.
W. Zhu, R. Xie, R. Wang, X. Sun, D. Wang, and P. Liu, “Proximal Supervised Fine-Tuning,” Aug. 25, 2025, arXiv: arXiv:2508.17784. doi: 10.48550/arXiv.2508.17784.
DOI:
https://doi.org/10.31449/inf.v50i13.12663Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







