Vision-based human detection by fine-tuned SSD models

Human-robot interaction (HRI) and human-robot collaboration (HRC) has become more popular as the industries are taking initiative to idealize the era of automation and digitalization. Introduction of robots are often considered as a risk due to the fact that robots do not own the intelligent as huma...

Full description

Bibliographic Details
Main Authors: Tang, Jin Cheng, Ab. Nasir, Ahmad Fakhri, Mohd Razman, Mohd Azraai, P. P. Abdul Majeed, Anwar, Thai, Li Lim
Format: Article
Language:English
Published: The Science and Information (SAI) Organization Limited 2022
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/35808/
http://umpir.ump.edu.my/id/eprint/35808/1/Paper_43-Vision_based_Human_Detection_by_Fine_Tuned_SSD_Models.pdf
_version_ 1848824875622334464
author Tang, Jin Cheng
Ab. Nasir, Ahmad Fakhri
Mohd Razman, Mohd Azraai
P. P. Abdul Majeed, Anwar
Thai, Li Lim
author_facet Tang, Jin Cheng
Ab. Nasir, Ahmad Fakhri
Mohd Razman, Mohd Azraai
P. P. Abdul Majeed, Anwar
Thai, Li Lim
author_sort Tang, Jin Cheng
building UMP Institutional Repository
collection Online Access
description Human-robot interaction (HRI) and human-robot collaboration (HRC) has become more popular as the industries are taking initiative to idealize the era of automation and digitalization. Introduction of robots are often considered as a risk due to the fact that robots do not own the intelligent as human does. However, the literature that uses deep learning technologies as the base to improve HRI safety are limited, not to mention transfer learning approach. Hence, this study intended to empirically examine the efficacy of transfer learning approach in human detection task by fine-tuning the SSD models. A custom image dataset is developed by using the surveillance system in TT Vision Holdings Berhad and annotated accordingly. Thereafter, the dataset is partitioned into the train, validation, and test set by a ratio of 70:20:10. The learning behaviour of the models was monitored throughout the fine-tuning process via total loss graph. The result reveals that the SSD fine-tuned model with MobileNetV1 achieved 87.20% test AP, which is 6.1% higher than the SSD fine-tuned model with MobileNetV2. As a trade-off, the SSD fine-tuned model with MobileNetV1 attained 46.2 ms inference time on RTX 3070, which is 9.6 ms slower as compared to SSD fine-tuned model with MobileNetV2. Taking test AP as the key metric, SSD fine-tuned model with MobileNetV1 is considered as the best fine-tuned model in this study. In conclusion, it has shown that the transfer learning approach within the deep learning domain can help to protect human from the risk by detecting human at the first place.
first_indexed 2025-11-15T03:19:59Z
format Article
id ump-35808
institution Universiti Malaysia Pahang
institution_category Local University
language English
last_indexed 2025-11-15T03:19:59Z
publishDate 2022
publisher The Science and Information (SAI) Organization Limited
recordtype eprints
repository_type Digital Repository
spelling ump-358082022-12-05T04:08:28Z http://umpir.ump.edu.my/id/eprint/35808/ Vision-based human detection by fine-tuned SSD models Tang, Jin Cheng Ab. Nasir, Ahmad Fakhri Mohd Razman, Mohd Azraai P. P. Abdul Majeed, Anwar Thai, Li Lim QA76 Computer software T Technology (General) TK Electrical engineering. Electronics Nuclear engineering Human-robot interaction (HRI) and human-robot collaboration (HRC) has become more popular as the industries are taking initiative to idealize the era of automation and digitalization. Introduction of robots are often considered as a risk due to the fact that robots do not own the intelligent as human does. However, the literature that uses deep learning technologies as the base to improve HRI safety are limited, not to mention transfer learning approach. Hence, this study intended to empirically examine the efficacy of transfer learning approach in human detection task by fine-tuning the SSD models. A custom image dataset is developed by using the surveillance system in TT Vision Holdings Berhad and annotated accordingly. Thereafter, the dataset is partitioned into the train, validation, and test set by a ratio of 70:20:10. The learning behaviour of the models was monitored throughout the fine-tuning process via total loss graph. The result reveals that the SSD fine-tuned model with MobileNetV1 achieved 87.20% test AP, which is 6.1% higher than the SSD fine-tuned model with MobileNetV2. As a trade-off, the SSD fine-tuned model with MobileNetV1 attained 46.2 ms inference time on RTX 3070, which is 9.6 ms slower as compared to SSD fine-tuned model with MobileNetV2. Taking test AP as the key metric, SSD fine-tuned model with MobileNetV1 is considered as the best fine-tuned model in this study. In conclusion, it has shown that the transfer learning approach within the deep learning domain can help to protect human from the risk by detecting human at the first place. The Science and Information (SAI) Organization Limited 2022 Article PeerReviewed pdf en cc_by_4 http://umpir.ump.edu.my/id/eprint/35808/1/Paper_43-Vision_based_Human_Detection_by_Fine_Tuned_SSD_Models.pdf Tang, Jin Cheng and Ab. Nasir, Ahmad Fakhri and Mohd Razman, Mohd Azraai and P. P. Abdul Majeed, Anwar and Thai, Li Lim (2022) Vision-based human detection by fine-tuned SSD models. International Journal of Advanced Computer Science and Applications (IJACSA), 13 (11). pp. 386-390. ISSN 2156-5570(Online). (Published) https://thesai.org/Downloads/Volume13No11/Paper_43-Vision_based_Human_Detection_by_Fine_Tuned_SSD_Models.pdf https://thesai.org/Downloads/Volume13No11/Paper_43-Vision_based_Human_Detection_by_Fine_Tuned_SSD_Models.pdf
spellingShingle QA76 Computer software
T Technology (General)
TK Electrical engineering. Electronics Nuclear engineering
Tang, Jin Cheng
Ab. Nasir, Ahmad Fakhri
Mohd Razman, Mohd Azraai
P. P. Abdul Majeed, Anwar
Thai, Li Lim
Vision-based human detection by fine-tuned SSD models
title Vision-based human detection by fine-tuned SSD models
title_full Vision-based human detection by fine-tuned SSD models
title_fullStr Vision-based human detection by fine-tuned SSD models
title_full_unstemmed Vision-based human detection by fine-tuned SSD models
title_short Vision-based human detection by fine-tuned SSD models
title_sort vision-based human detection by fine-tuned ssd models
topic QA76 Computer software
T Technology (General)
TK Electrical engineering. Electronics Nuclear engineering
url http://umpir.ump.edu.my/id/eprint/35808/
http://umpir.ump.edu.my/id/eprint/35808/
http://umpir.ump.edu.my/id/eprint/35808/
http://umpir.ump.edu.my/id/eprint/35808/1/Paper_43-Vision_based_Human_Detection_by_Fine_Tuned_SSD_Models.pdf