A multi-depot dynamic vehicle routing problem with stochastic road capacity: an MDP model and dynamic policy for post-decision state rollout algorithm in reinforcement learning
In the event of a disaster, the road network is often compromised in terms of its capacity and usability conditions. This is a challenge for humanitarian operations in the context of delivering critical medical supplies. To optimise vehicle routing for such a problem, a Multi-Depot Dynamic Vehicle-R...
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Published: |
Multidisciplinary Digital Publishing Institute
2022
|
| Online Access: | http://psasir.upm.edu.my/id/eprint/100189/ |
| _version_ | 1848863258159611904 |
|---|---|
| author | Anuar, Wadi Khalid Lee, Lai Soon Seow, Hsin-Vonn Pickl, Stefan |
| author_facet | Anuar, Wadi Khalid Lee, Lai Soon Seow, Hsin-Vonn Pickl, Stefan |
| author_sort | Anuar, Wadi Khalid |
| building | UPM Institutional Repository |
| collection | Online Access |
| description | In the event of a disaster, the road network is often compromised in terms of its capacity and usability conditions. This is a challenge for humanitarian operations in the context of delivering critical medical supplies. To optimise vehicle routing for such a problem, a Multi-Depot Dynamic Vehicle-Routing Problem with Stochastic Road Capacity (MDDVRPSRC) is formulated as a Markov Decision Processes (MDP) model. An Approximate Dynamic Programming (ADP) solution method is adopted where the Post-Decision State Rollout Algorithm (PDS-RA) is applied as the lookahead approach. To perform the rollout effectively for the problem, the PDS-RA is executed for all vehicles assigned for the problem. Then, at the end, a decision is made by the agent. Five types of constructive base heuristics are proposed for the PDS-RA. First, the Teach Base Insertion Heuristic (TBIH-1) is proposed to study the partial random construction approach for the non-obvious decision. The heuristic is extended by proposing TBIH-2 and TBIH-3 to show how Sequential Insertion Heuristic (SIH) (I1) as well as Clarke and Wright (CW) could be executed, respectively, in a dynamic setting as a modification to the TBIH-1. Additionally, another two heuristics: TBIH-4 and TBIH-5 (TBIH-1 with the addition of Dynamic Lookahead SIH (DLASIH) and Dynamic Lookahead CW (DLACW) respectively) are proposed to improve the on-the-go constructed decision rule (dynamic policy on the go) in the lookahead simulations. The results obtained are compared with the matheuristic approach from previous work based on PDS-RA. |
| first_indexed | 2025-11-15T13:30:03Z |
| format | Article |
| id | upm-100189 |
| institution | Universiti Putra Malaysia |
| institution_category | Local University |
| last_indexed | 2025-11-15T13:30:03Z |
| publishDate | 2022 |
| publisher | Multidisciplinary Digital Publishing Institute |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | upm-1001892024-07-15T03:30:09Z http://psasir.upm.edu.my/id/eprint/100189/ A multi-depot dynamic vehicle routing problem with stochastic road capacity: an MDP model and dynamic policy for post-decision state rollout algorithm in reinforcement learning Anuar, Wadi Khalid Lee, Lai Soon Seow, Hsin-Vonn Pickl, Stefan In the event of a disaster, the road network is often compromised in terms of its capacity and usability conditions. This is a challenge for humanitarian operations in the context of delivering critical medical supplies. To optimise vehicle routing for such a problem, a Multi-Depot Dynamic Vehicle-Routing Problem with Stochastic Road Capacity (MDDVRPSRC) is formulated as a Markov Decision Processes (MDP) model. An Approximate Dynamic Programming (ADP) solution method is adopted where the Post-Decision State Rollout Algorithm (PDS-RA) is applied as the lookahead approach. To perform the rollout effectively for the problem, the PDS-RA is executed for all vehicles assigned for the problem. Then, at the end, a decision is made by the agent. Five types of constructive base heuristics are proposed for the PDS-RA. First, the Teach Base Insertion Heuristic (TBIH-1) is proposed to study the partial random construction approach for the non-obvious decision. The heuristic is extended by proposing TBIH-2 and TBIH-3 to show how Sequential Insertion Heuristic (SIH) (I1) as well as Clarke and Wright (CW) could be executed, respectively, in a dynamic setting as a modification to the TBIH-1. Additionally, another two heuristics: TBIH-4 and TBIH-5 (TBIH-1 with the addition of Dynamic Lookahead SIH (DLASIH) and Dynamic Lookahead CW (DLACW) respectively) are proposed to improve the on-the-go constructed decision rule (dynamic policy on the go) in the lookahead simulations. The results obtained are compared with the matheuristic approach from previous work based on PDS-RA. Multidisciplinary Digital Publishing Institute 2022-07-30 Article PeerReviewed Anuar, Wadi Khalid and Lee, Lai Soon and Seow, Hsin-Vonn and Pickl, Stefan (2022) A multi-depot dynamic vehicle routing problem with stochastic road capacity: an MDP model and dynamic policy for post-decision state rollout algorithm in reinforcement learning. Mathematics, 10 (15). art. no. 2699. pp. 1-70. ISSN 2227-7390 https://www.mdpi.com/2227-7390/10/15/2699 10.3390/math10152699 |
| spellingShingle | Anuar, Wadi Khalid Lee, Lai Soon Seow, Hsin-Vonn Pickl, Stefan A multi-depot dynamic vehicle routing problem with stochastic road capacity: an MDP model and dynamic policy for post-decision state rollout algorithm in reinforcement learning |
| title | A multi-depot dynamic vehicle routing problem with stochastic road capacity: an MDP model and dynamic policy for post-decision state rollout algorithm in reinforcement learning |
| title_full | A multi-depot dynamic vehicle routing problem with stochastic road capacity: an MDP model and dynamic policy for post-decision state rollout algorithm in reinforcement learning |
| title_fullStr | A multi-depot dynamic vehicle routing problem with stochastic road capacity: an MDP model and dynamic policy for post-decision state rollout algorithm in reinforcement learning |
| title_full_unstemmed | A multi-depot dynamic vehicle routing problem with stochastic road capacity: an MDP model and dynamic policy for post-decision state rollout algorithm in reinforcement learning |
| title_short | A multi-depot dynamic vehicle routing problem with stochastic road capacity: an MDP model and dynamic policy for post-decision state rollout algorithm in reinforcement learning |
| title_sort | multi-depot dynamic vehicle routing problem with stochastic road capacity: an mdp model and dynamic policy for post-decision state rollout algorithm in reinforcement learning |
| url | http://psasir.upm.edu.my/id/eprint/100189/ http://psasir.upm.edu.my/id/eprint/100189/ http://psasir.upm.edu.my/id/eprint/100189/ |