Human detection and tracking with YOLO and SORT tracking algorithm

Human tracking is often performed on publicly available well annotated datasets where the dataset development is always avoided because of the tiring process. Publicly available well-annotated datasets are ideal for training because those generate higher tracking accuracy. This paper performs human...

Full description

Bibliographic Details
Main Authors: Kader, Tanveer, Ahmad Fakhri, Ab Nasir, Muhammad Zulfahmi, Toh Abdullah@ Toh Chin Lai, Muhammad Nur Aiman, Shapiee, Amir Fakarullsroq, Abdul Razak
Format: Article
Language:English
Published: The Science and Information (SAI) Organization Limited 2025
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/45070/
http://umpir.ump.edu.my/id/eprint/45070/1/Paper_14-Human_Detection_and_Tracking_with_YOLO.pdf
Description
Summary:Human tracking is often performed on publicly available well annotated datasets where the dataset development is always avoided because of the tiring process. Publicly available well-annotated datasets are ideal for training because those generate higher tracking accuracy. This paper performs human tracking on videos recorded manually using optimized detectors following the tracking by detection framework. Manually recorded videos were used to develop a dataset which comprises more than 8k image sequences. Both indoor and outdoor scenarios were chosen to maintain different lighting conditions which make tracking difficult. All these image frames are labelled with bounding boxes for humans. The dataset is prepared by following the MOT15 dataset structure. A unique annotation process was performed that reduced the annotation labour by almost 80% which was a combination of manual annotation and prediction from pretrained models. Different sizes of You Only Look Once (YOLO) detection model (n/s/m) were trained using the train dataset focusing on humans and coupled with two most popular tracking algorithms Simple Online Realtime Tracking (SORT) and DeepSORT. The YOLOv8 and YOLO11 models were optimized with proper hyperparameter values followed by tracking using SORT and DeepSORT. The results were observed with those models on different confidence and Intersection over Union (IoU) threshold values. This study finds a proportional relation with the optimization of detection models and tracking accuracy. YOLO11m with DeepSORT tracker performed best on the test data with 74% Multiple Object Tracking Accuracy (MOTA) also the other optimized YOLO models tend to perform better with the trackers than the unoptimized ones.