Twin delayed deep deterministic policy gradient-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage training

Target tracking using an unmanned aerial vehicle (UAV) is a challenging robotic problem. It requires handling a high level of nonlinearity and dynamics. Model-free control effectively handles the uncertain nature of the problem, and reinforcement learning (RL)-based approaches are a good candidate...

Full description

Bibliographic Details
Main Authors: Abo Mosali, Najmaddin, Shamsudin, Syariful Syafiq, Alfandi, Omar, Omar, Rosli, AL-Fadhali, Najib
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers 2022
Subjects:
Online Access:http://eprints.uthm.edu.my/6913/
http://eprints.uthm.edu.my/6913/1/J14006_28269837c18b517045750a7bc07cf431.pdf
_version_ 1848888947594231808
author Abo Mosali, Najmaddin
Shamsudin, Syariful Syafiq
Alfandi, Omar
Omar, Rosli
AL-Fadhali, Najib
author_facet Abo Mosali, Najmaddin
Shamsudin, Syariful Syafiq
Alfandi, Omar
Omar, Rosli
AL-Fadhali, Najib
author_sort Abo Mosali, Najmaddin
building UTHM Institutional Repository
collection Online Access
description Target tracking using an unmanned aerial vehicle (UAV) is a challenging robotic problem. It requires handling a high level of nonlinearity and dynamics. Model-free control effectively handles the uncertain nature of the problem, and reinforcement learning (RL)-based approaches are a good candidate for solving this problem. In this article, the Twin Delayed Deep Deterministic Policy Gradient Algorithm (TD3), as recent and composite architecture of RL, was explored as a tracking agent for the UAV-based target tracking problem. Several improvements on the original TD3 were also performed. First, the proportional�differential controller was used to boost the exploration of the TD3 in training. Second, a novel reward formulation for the UAV-based target tracking enabled a careful combination of the various dynamic variables in the reward functions. This was accomplished by incorporating two exponential functions to limit the effect of velocity and acceleration to prevent the deformation in the policy function approximation. In addition, the concept of multistage training based on the dynamic variables was proposed as an opposing concept to one-stage combinatory training. Third, an enhancement of the rewarding function by including piecewise decomposition was used to enable more stable learning behaviour of the policy and move out from the linear reward to the achievement formula. The training was conducted based on fixed target tracking followed by moving target tracking. The flight testing was conducted based on three types of target trajectories: fixed, square, and blinking. The multistage training achieved the best performance with both exponential and achievement rewarding for the fixed trained agent with the fixed and square moving target and for the combined agent with both exponential and achievement rewarding for a fixed trained agent in the case of a blinking target. With respect to the traditional proportional differential controller, the maximum error reduction rate is 86%. The developed achievement rewarding and the multistage training opens the door to various applications of RL in target tracking.
first_indexed 2025-11-15T20:18:23Z
format Article
id uthm-6913
institution Universiti Tun Hussein Onn Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T20:18:23Z
publishDate 2022
publisher Institute of Electrical and Electronics Engineers
recordtype eprints
repository_type Digital Repository
spelling uthm-69132022-04-12T06:55:08Z http://eprints.uthm.edu.my/6913/ Twin delayed deep deterministic policy gradient-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage training Abo Mosali, Najmaddin Shamsudin, Syariful Syafiq Alfandi, Omar Omar, Rosli AL-Fadhali, Najib TL500-777 Aeronautics. Aeronautical engineering Target tracking using an unmanned aerial vehicle (UAV) is a challenging robotic problem. It requires handling a high level of nonlinearity and dynamics. Model-free control effectively handles the uncertain nature of the problem, and reinforcement learning (RL)-based approaches are a good candidate for solving this problem. In this article, the Twin Delayed Deep Deterministic Policy Gradient Algorithm (TD3), as recent and composite architecture of RL, was explored as a tracking agent for the UAV-based target tracking problem. Several improvements on the original TD3 were also performed. First, the proportional�differential controller was used to boost the exploration of the TD3 in training. Second, a novel reward formulation for the UAV-based target tracking enabled a careful combination of the various dynamic variables in the reward functions. This was accomplished by incorporating two exponential functions to limit the effect of velocity and acceleration to prevent the deformation in the policy function approximation. In addition, the concept of multistage training based on the dynamic variables was proposed as an opposing concept to one-stage combinatory training. Third, an enhancement of the rewarding function by including piecewise decomposition was used to enable more stable learning behaviour of the policy and move out from the linear reward to the achievement formula. The training was conducted based on fixed target tracking followed by moving target tracking. The flight testing was conducted based on three types of target trajectories: fixed, square, and blinking. The multistage training achieved the best performance with both exponential and achievement rewarding for the fixed trained agent with the fixed and square moving target and for the combined agent with both exponential and achievement rewarding for a fixed trained agent in the case of a blinking target. With respect to the traditional proportional differential controller, the maximum error reduction rate is 86%. The developed achievement rewarding and the multistage training opens the door to various applications of RL in target tracking. Institute of Electrical and Electronics Engineers 2022 Article PeerReviewed text en http://eprints.uthm.edu.my/6913/1/J14006_28269837c18b517045750a7bc07cf431.pdf Abo Mosali, Najmaddin and Shamsudin, Syariful Syafiq and Alfandi, Omar and Omar, Rosli and AL-Fadhali, Najib (2022) Twin delayed deep deterministic policy gradient-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage training. IEEE Access, 10. pp. 23545-23559. ISSN 2169-3536 https://doi.org/10.1109/ACCESS.2022.3154388
spellingShingle TL500-777 Aeronautics. Aeronautical engineering
Abo Mosali, Najmaddin
Shamsudin, Syariful Syafiq
Alfandi, Omar
Omar, Rosli
AL-Fadhali, Najib
Twin delayed deep deterministic policy gradient-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage training
title Twin delayed deep deterministic policy gradient-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage training
title_full Twin delayed deep deterministic policy gradient-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage training
title_fullStr Twin delayed deep deterministic policy gradient-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage training
title_full_unstemmed Twin delayed deep deterministic policy gradient-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage training
title_short Twin delayed deep deterministic policy gradient-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage training
title_sort twin delayed deep deterministic policy gradient-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage training
topic TL500-777 Aeronautics. Aeronautical engineering
url http://eprints.uthm.edu.my/6913/
http://eprints.uthm.edu.my/6913/
http://eprints.uthm.edu.my/6913/1/J14006_28269837c18b517045750a7bc07cf431.pdf