Double Deep Q-Network with Experience Replay for Time Dependent Vehicle Routing Problem with Time Windows Under Historical Congestion Constraints

Rina Refianti; Alifurrohman Alifurrohman; Eri Prasetyo Wibowo; Ina Siti Hasanah; Achmad Benny Mutiara

doi:10.31449/inf.v50i9.12122

Abstract

This study addresses the Time-Dependent Vehicle Routing Problem with Time Windows (TD-VRPTW) for a single-vehicle urban distribution system in Jakarta. Time-dependent travel times are constructed from one week of hourly historical congestion profiles obtained from TomTom Traffic and preprocessed into time-varying speed factors that are mapped to 40- and 50-customer delivery instances with a common service window of 08:00–19:00. A Deep Q-Network (DQN) enhanced with Double DQN and Prioritized Experience Replay (PER) is trained end-to-end using a multilayer perceptron with two hidden layers (128 and 64 units, ReLU activations) to approximate the state–action value function. The reward function penalizes time-dependent travel time, lateness with respect to customer time windows, long inter-customer jumps, and inter-cluster moves, thereby shaping the policy toward both schedule adherence and congestion-aware routing. For each scenario, the agent is trained for 1,000 episodes under three random seeds and evaluated on three representative weekdays (Monday, Wednesday, and Friday). Across all settings, the learned policy achieves a 100% on-time delivery rate with zero late customers, with best time-dependent route costs of approximately 526–539 minutes for 40 customers and 595–617 minutes for 50 customers. Comparative experiments with Genetic Algorithm (GA) and Ant Colony Optimization (ACO) show that ACO attains the shortest travel times, while the proposed DQN+PER model yields routes that are only about 5–8% longer than ACO but reduce time-dependent travel cost by roughly 35–45% compared with GA in the same TD-VRPTW instances. Reward and loss trajectories exhibit smooth convergence, and a sensitivity analysis on the lateness penalty confirms that the main conclusions are robust to hyperparameter variations. These findings demonstrate that leveraging historical congestion to build time-dependent travel times enables DQN-based control to produce competitive, congestion-aware solutions for TD-VRPTW in realistic urban distribution networks.

References

[1] B. Lin, B. Ghaddar, and J. J. I. T. o. I. T. S. Nathwani, "Deep reinforcement learning for the electric vehicle routing problem with time windows," vol. 23, no. 8, pp. 11528-11538, 2021.

10.1109/TITS.2021.3105232

[2] X. Zhang, Y. Yang, J. Cai, Q. Zhu, W. Chen, and Q. Lin, "Deep Reinforcement Learning-Based Multi-Agent Algorithm for Vehicle Routing Problem in Complex Logistics Scenarios," in 2024 International Joint Conference on Neural Networks (IJCNN), 2024, pp. 1-8: IEEE. 10.1109/IJCNN60899.2024.10650335

[3] W. Pan and S. Q. J. A. I. Liu, "Deep reinforcement learning for the dynamic and uncertain vehicle routing problem," vol. 53, no. 1, pp. 405-422, 2023.

https://doi.org/10.1007/s10489-022-03456-w

[4] F. Guo, Q. Wei, M. Wang, Z. Guo, S. W. J. T. r. p. E. l. Wallace, and t. review, "Deep attention models with dimension-reduction and gate mechanisms for solving practical time-dependent vehicle routing problems," vol. 173, p. 103095, 2023. https://doi.org/10.1016/j.tre.2023.103095

[5] H. Ben Ticha, N. Absi, D. Feillet, A. Quilliot, and T. Van Woensel, "The time-dependent vehicle routing problem with time windows and road-network information," in Operations Research Forum, 2021, vol. 2, no. 1, p. 4: Springer. https://doi.org/10.1007/s43069-020-00049-6

[6] M. Gmira, M. Gendreau, A. Lodi, and J.-Y. J. E. J. o. O. R. Potvin, "Tabu search for the time-dependent vehicle routing problem with time windows on a road network," vol. 288, no. 1, pp. 129-140,2021. https://doi.org/10.1016/j.ejor.2020.05.041

[7] M. Ammouriova, E. M. Herrera, M. Neroni, A. A. Juan, and J. J. A. S. Faulin, "Solving vehicle routing problems under uncertainty and in dynamic scenarios: From simheuristics to agile optimization," vol. 13, no. 1, p. 101, 2022. https://doi.org/10.3390/app13010101

[8] Y. Yunita, D. Stiawan, and D. P. Rini, "Vehicle Routing Problem with Time Windows using Hybrid Metaheuristic Dragonfly Algorithm and Variable Neighborhood Search: Work on Progress," in 2024 11th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), 2024, pp. 521-525: IEEE. 10.1109/EECSI63442.2024.10776058

[9] S. Ara et al., "Vehicle Routing Problem Solving Using Reinforcement Learning," in 2023 26th International Conference on Computer and Information Technology (ICCIT), 2023, pp. 1-6: IEEE. 10.1109/ICCIT60459.2023.10441644

[10] A. Gupta, S. Ghosh, and A. Dhara, "Deep reinforcement learning algorithm for fast solutions to vehicle routing problem with time-windows," in Proceedings of the 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD), 2022, pp. 236-240. https://doi.org/10.1145/3493700.349372

[11] S. Kosolsombat and C. Ratanavilisagul, "Applied Deep Reinforcement Learning for Solving the Vehicle Routing Problem with Time Windows," in 2023 8th International Conference on Computational Intelligence and Applications (ICCIA), 2023, pp. 21-25: IEEE. 10.1109/ICCIA59741.2023.00012

[12] F. Moreno-Vera, "Performing deep recurrent double q-learning for atari games," in 2019 IEEE Latin American Conference on Computational Intelligence (LA-CCI), 2019, pp. 1-4: IEEE. DOI: 10.1109/LA-CCI47412.2019.9036763

[13] J. Escobar-Naranjo, G. Caiza, P. Ayala, E. Jordan, C. A. Garcia, and M. V. J. A. S. Garcia, "Autonomous navigation of robots: optimization with DQN," vol. 13, no. 12, p. 7202, 2023. https://doi.org/10.3390/app13127202

[14] Y. J. H. i. S. E. Yang and Technology, "Path planning under high-dimensional input states based on deep q-network," vol. 120, pp. 576-585, 2024. https://doi.org/10.54097/r6fs0580

[15] D. Seng, J. Zhang, X. J. K. T. o. I. Shi, and I. Systems, "Visual Analysis of Deep Q-network," vol. 15, no. 3, 2021. 10.3837/tiis.2021.03.003

[16] C. Liu, G. Kou, X. Zhou, Y. Peng, H. Sheng, and F. E. J. K.-B. S. Alsaadi, "Time-dependent vehicle routing problem with time windows of city logistics with a congestion avoidance approach," vol. 188, p. 104813, 2020. https://doi.org/10.1016/j.knosys.2019.06.021

[17] H. Fan, Y. Zhang, P. Tian, Y. Lv, H. J. C. Fan, and O. Research, "Time-dependent multi-depot green vehicle routing problem with time windows considering temporal-spatial distance," vol. 129, p. 105211, 2021. https://doi.org/10.1016/j.cor.2021.105211

[18] L. Wang, S. Gao, K. Wang, T. Li, L. Li, and Z. J. J. o. A. T. Chen, "Time‐Dependent Electric Vehicle Routing Problem with Time Windows and Path Flexibility," vol. 2020, no. 1, p. 3030197, 2020. https://doi.org/10.1155/2020/3030197

[19] M. A. Cruz-Chávez, A. Rodríguez-León, R. Rivera-López, and M. H. Cruz-Rosales, "A Grid-Based Genetic Approach to Solving the Vehicle Routing Problem with Time Windows," vol. 9, no. 18, p. 3656, 2019.

https://doi.org/10.3390/app9183656

[20] G. Chen, J. Gao, and D. J. E. Chen, "Research on Vehicle Routing Problem with Time Windows Based on Improved Genetic Algorithm and Ant Colony Algorithm," vol. 14, no. 4, 2025. https://doi.org/10.3390/electronics14040647

[21] J. Cai, X. Zhang, Q. Lin, L. Dong, W. Chen, and Z. Ming, "Deep Reinforcement Learning for Solving the Vehicle Routing Problem in Practical Logistics," in 2024 IEEE Congress on Evolutionary Computation (CEC), 2024, pp. 1-8: IEEE. 10.1109/CEC60901.2024.10612190

[22] B. Yue, J. Ma, J. Shi, and J. J. I. a. Yang, "A deep reinforcement learning-based adaptive search for solving time-dependent green vehicle routing problem," vol. 12, pp. 33400-33419, 2024. 10.1109/ACCESS.2024.3369474

[23] M. Patil, P. Tambolkar, and S. J. I. I. T. S. Midlam‐Mohler, "Optimizing Traffic Routes With Enhanced Double Q‐Learning," vol. 19, no. 1, p. e70002, 2025.

https://doi.org/10.1049/itr2.70002

[24] Z. Zhu, C. Hu, C. Zhu, Y. Zhu, Y. J. J. o. M. S. Sheng, and Engineering, "An improved dueling deep double-q network based on prioritized experience replay for path planning of unmanned surface vehicles," vol. 9, no. 11, p. 1267, 2021. https://doi.org/10.3390/jmse9111267

[25] Q. Huo, "Multi-objective vehicle path planning based on DQN," in International Conference on Cloud Computing, Performance Computing, and Deep Learning (CCPCDL 2022), 2022, vol. 12287, pp. 351-357: SPIE.

https://doi.org/10.1117/12.2640707

[26] Y. Niu, F. Zhu, and P. Zhai, "An autonomous decision-making algorithm for ship collision avoidance based on DDQN with prioritized experience replay," in 2023 7th international conference on transportation information and safety (ICTIS), 2023, pp. 1174-1180: IEEE. 10.1109/ICTIS60134.2023.10243882

[27] S. Moon, S. Koo, Y. Lim, and H. J. A. S. Joo, "Routing control optimization for autonomous vehicles in mixed traffic flow based on deep reinforcement learning," vol. 14, no. 5, p. 2214, 2024. https://doi.org/10.3390/app14052214

[28] L. P. A. Sanchez, Y. Shen, M. J. J. o. N. Guo, and C. Applications, "Mdq: A qos-congestion aware deep reinforcement learning approach for multi-path routing in sdn," vol. 235, p. 104082, 2025. https://doi.org/10.1016/j.jnca.2024.104082

[29] T. Carić, J. J. P.-T. Fosin, and Transportation, "Using congestion zones for solving the time dependent vehicle routing problem," vol. 32, no. 1, pp. 25-38, 2020.

https://doi.org/10.7307/ptt.v32i1.3296

Double Deep Q-Network with Experience Replay for Time Dependent Vehicle Routing Problem with Time Windows Under Historical Congestion Constraints

Abstract

References

Authors

DOI:

Downloads

Published

How to Cite

Issue

Section

License

Developed By

Information