Human detection and tracking with YOLO and SORT tracking algorithm
Human tracking is often performed on publicly available well annotated datasets where the dataset development is always avoided because of the tiring process. Publicly available well-annotated datasets are ideal for training because those generate higher tracking accuracy. This paper performs human...
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
The Science and Information (SAI) Organization Limited
2025
|
| Subjects: | |
| Online Access: | http://umpir.ump.edu.my/id/eprint/45070/ http://umpir.ump.edu.my/id/eprint/45070/1/Paper_14-Human_Detection_and_Tracking_with_YOLO.pdf |
| Summary: | Human tracking is often performed on publicly available well annotated datasets where the dataset development is always avoided because of the tiring process. Publicly available well-annotated datasets are ideal for training because those generate higher tracking accuracy. This paper performs human tracking on videos recorded manually using optimized detectors following the tracking by detection framework. Manually recorded videos were used to develop a dataset which comprises more than 8k image sequences. Both indoor and outdoor scenarios were chosen to maintain different lighting conditions which make tracking difficult. All these image frames are labelled with bounding boxes for humans. The dataset is prepared by following the MOT15 dataset structure. A unique annotation process was performed that reduced the annotation labour by almost 80% which was a combination of manual annotation and prediction from pretrained models. Different sizes of You Only Look Once (YOLO) detection model (n/s/m) were trained using the train dataset focusing on humans and coupled with two most popular tracking algorithms Simple Online Realtime Tracking (SORT) and DeepSORT. The YOLOv8 and YOLO11 models were optimized with proper hyperparameter values followed by tracking using SORT and DeepSORT. The results were observed with those models on different confidence and Intersection over Union (IoU) threshold values. This study finds a proportional relation with the optimization of detection models and tracking accuracy. YOLO11m with DeepSORT tracker performed best on the test data with 74% Multiple Object Tracking Accuracy (MOTA) also the other optimized YOLO models tend to perform better with the trackers than the unoptimized ones. |
|---|