Evaluation of Three Feature Dimension Reduction Techniques for Machine Learning-Based Crop Yield Prediction Models

Machine learning (ML) has been widely used worldwide to develop crop yield forecasting models. However, it is still challenging to identify the most critical features from a dataset. Although either feature selection (FS) or feature extraction (FX) techniques have been employed, no research compares...

Full description

Bibliographic Details
Main Authors: Pham, Hoa Thi, Awange, Joseph, Kuhn, Michael
Format: Journal Article
Language:English
Published: MDPI 2022
Subjects:
Online Access:http://hdl.handle.net/20.500.11937/91904
_version_ 1848765600205111296
author Pham, Hoa Thi
Awange, Joseph
Kuhn, Michael
author_facet Pham, Hoa Thi
Awange, Joseph
Kuhn, Michael
author_sort Pham, Hoa Thi
building Curtin Institutional Repository
collection Online Access
description Machine learning (ML) has been widely used worldwide to develop crop yield forecasting models. However, it is still challenging to identify the most critical features from a dataset. Although either feature selection (FS) or feature extraction (FX) techniques have been employed, no research compares their performances and, more importantly, the benefits of combining both methods. Therefore, this paper proposes a framework that uses non-feature reduction (All-F) as a baseline to investigate the performance of FS, FX, and a combination of both (FSX). The case study employs the vegetation condition index (VCI)/temperature condition index (TCI) to develop 21 rice yield forecasting models for eight sub-regions in Vietnam based on ML methods, namely linear, support vector machine (SVM), decision tree (Tree), artificial neural network (ANN), and Ensemble. The results reveal that FSX takes full advantage of the FS and FX, leading FSX-based models to perform the best in 18 out of 21 models, while 2 (1) for FS-based (FX-based) models. These FXS-, FS-, and FX-based models improve All-F-based models at an average level of 21% and up to 60% in terms of RMSE. Furthermore, 21 of the best models are developed based on Ensemble (13 models), Tree (6 models), linear (1 model), and ANN (1 model). These findings highlight the significant role of FS, FX, and specially FSX coupled with a wide range of ML algorithms (especially Ensemble) for enhancing the accuracy of predicting crop yield.
first_indexed 2025-11-14T11:37:49Z
format Journal Article
id curtin-20.500.11937-91904
institution Curtin University Malaysia
institution_category Local University
language English
last_indexed 2025-11-14T11:37:49Z
publishDate 2022
publisher MDPI
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-919042023-06-07T03:56:03Z Evaluation of Three Feature Dimension Reduction Techniques for Machine Learning-Based Crop Yield Prediction Models Pham, Hoa Thi Awange, Joseph Kuhn, Michael Science & Technology Physical Sciences Technology Chemistry, Analytical Engineering, Electrical & Electronic Instruments & Instrumentation Chemistry Engineering feature selection feature extraction machine learning crop yield VCI TCI VEGETATION HEALTH INDEXES FEATURE-SELECTION NEURAL-NETWORKS DROUGHT TCI VCI crop yield feature extraction feature selection machine learning Algorithms Forecasting Machine Learning Neural Networks, Computer Support Vector Machine Algorithms Forecasting Machine Learning Support Vector Machine Neural Networks, Computer Machine learning (ML) has been widely used worldwide to develop crop yield forecasting models. However, it is still challenging to identify the most critical features from a dataset. Although either feature selection (FS) or feature extraction (FX) techniques have been employed, no research compares their performances and, more importantly, the benefits of combining both methods. Therefore, this paper proposes a framework that uses non-feature reduction (All-F) as a baseline to investigate the performance of FS, FX, and a combination of both (FSX). The case study employs the vegetation condition index (VCI)/temperature condition index (TCI) to develop 21 rice yield forecasting models for eight sub-regions in Vietnam based on ML methods, namely linear, support vector machine (SVM), decision tree (Tree), artificial neural network (ANN), and Ensemble. The results reveal that FSX takes full advantage of the FS and FX, leading FSX-based models to perform the best in 18 out of 21 models, while 2 (1) for FS-based (FX-based) models. These FXS-, FS-, and FX-based models improve All-F-based models at an average level of 21% and up to 60% in terms of RMSE. Furthermore, 21 of the best models are developed based on Ensemble (13 models), Tree (6 models), linear (1 model), and ANN (1 model). These findings highlight the significant role of FS, FX, and specially FSX coupled with a wide range of ML algorithms (especially Ensemble) for enhancing the accuracy of predicting crop yield. 2022 Journal Article http://hdl.handle.net/20.500.11937/91904 10.3390/s22176609 English http://creativecommons.org/licenses/by/4.0/ MDPI fulltext
spellingShingle Science & Technology
Physical Sciences
Technology
Chemistry, Analytical
Engineering, Electrical & Electronic
Instruments & Instrumentation
Chemistry
Engineering
feature selection
feature extraction
machine learning
crop yield
VCI
TCI
VEGETATION HEALTH INDEXES
FEATURE-SELECTION
NEURAL-NETWORKS
DROUGHT
TCI
VCI
crop yield
feature extraction
feature selection
machine learning
Algorithms
Forecasting
Machine Learning
Neural Networks, Computer
Support Vector Machine
Algorithms
Forecasting
Machine Learning
Support Vector Machine
Neural Networks, Computer
Pham, Hoa Thi
Awange, Joseph
Kuhn, Michael
Evaluation of Three Feature Dimension Reduction Techniques for Machine Learning-Based Crop Yield Prediction Models
title Evaluation of Three Feature Dimension Reduction Techniques for Machine Learning-Based Crop Yield Prediction Models
title_full Evaluation of Three Feature Dimension Reduction Techniques for Machine Learning-Based Crop Yield Prediction Models
title_fullStr Evaluation of Three Feature Dimension Reduction Techniques for Machine Learning-Based Crop Yield Prediction Models
title_full_unstemmed Evaluation of Three Feature Dimension Reduction Techniques for Machine Learning-Based Crop Yield Prediction Models
title_short Evaluation of Three Feature Dimension Reduction Techniques for Machine Learning-Based Crop Yield Prediction Models
title_sort evaluation of three feature dimension reduction techniques for machine learning-based crop yield prediction models
topic Science & Technology
Physical Sciences
Technology
Chemistry, Analytical
Engineering, Electrical & Electronic
Instruments & Instrumentation
Chemistry
Engineering
feature selection
feature extraction
machine learning
crop yield
VCI
TCI
VEGETATION HEALTH INDEXES
FEATURE-SELECTION
NEURAL-NETWORKS
DROUGHT
TCI
VCI
crop yield
feature extraction
feature selection
machine learning
Algorithms
Forecasting
Machine Learning
Neural Networks, Computer
Support Vector Machine
Algorithms
Forecasting
Machine Learning
Support Vector Machine
Neural Networks, Computer
url http://hdl.handle.net/20.500.11937/91904