Effect of datasets size on the machine learning performance of the bagworm, Metisa plana (Walker) infestation using UAV remote sensing

A leaf-eating pest, Metisa plana (Lepidoptera: Psychidae), could cause 10–13% leaf defoliation and up to 40% crop losses, which would have a significant detrimental economic influence on Malaysian oil palm on yield production. A manual census was carried out to measure the current level of infestati...

Full description

Bibliographic Details
Main Authors: Mohd Johari, Siti Nurul Afiah, Khairunniza-Bejo, Siti, Mohamed Shariff, Abdul Rashid, Husin, Nur Azuan, Mohd Masri, Mohamed Mazmira, Kamarudin, Noorhazwani
Format: Article
Language:English
Published: Springer Science and Business Media LLC 2024
Online Access:http://psasir.upm.edu.my/id/eprint/117901/
http://psasir.upm.edu.my/id/eprint/117901/1/117901.pdf
Description
Summary:A leaf-eating pest, Metisa plana (Lepidoptera: Psychidae), could cause 10–13% leaf defoliation and up to 40% crop losses, which would have a significant detrimental economic influence on Malaysian oil palm on yield production. A manual census was carried out to measure the current level of infestation; however, it became time-consuming when covering a large area. Unmanned aerial vehicles (UAVs) were chosen as the solution due to their rapid assess of the severity of the bagworm infestation. Nevertheless, there is a greater chance of unbalanced data when employing UAV imagery, which may be a problem when determining the degree of infestation. Therefore, this study evaluated the impact of both balanced and imbalanced infestation level data on machine learning classification performance via three combinations of vegetation indices: NDVI-NDRE, NDVI-GNDVI and NDRE-GNDVI. Resampling method was carried out using random oversampling (ROS), synthetic minority oversampling techniques (SMOTE), random undersampling (RUS), 3-interval undersampling and 5-interval undersampling. Results showed that the best performance with 86.84% successful classification of 100% F1-score using imbalanced data of 3-interval undersampling. Fine KNN was constantly well performed in classifying all infestation levels in NDVI-NDRE combination across all datasets. The results unequivocally show that the 66.67% reduction in the sample size increases the chances of successful classification, even in situations where the data are unbalanced.