A review of feature selection methods on diabetes mellitus classification

Diabetes is a leading cause of death in the United States and leads to serious health complications. In recent decades, artificial intelligence technology and its subfield, machine learning, have been increasingly utilized to aid in disease diagnosis. Machine learning methods must be robust enough t...

Full description

Bibliographic Details
Main Authors: Nur Farahaina, Idris, Mohd Arfian, Ismail, Shahreen, Kasim, Rohayanti, Hassan, Deshinta Arrova Dewi, ., Abdullah Munzir, Mohd Fauzi, Rahmat, Hidayat
Format: Article
Language:English
Published: Indonesian Society for Knowledge and Human Development 2025
Subjects:
Online Access:https://umpir.ump.edu.my/id/eprint/45191/
_version_ 1848827349573828608
author Nur Farahaina, Idris
Mohd Arfian, Ismail
Shahreen, Kasim
Rohayanti, Hassan
Deshinta Arrova Dewi, .
Abdullah Munzir, Mohd Fauzi
Rahmat, Hidayat
author_facet Nur Farahaina, Idris
Mohd Arfian, Ismail
Shahreen, Kasim
Rohayanti, Hassan
Deshinta Arrova Dewi, .
Abdullah Munzir, Mohd Fauzi
Rahmat, Hidayat
author_sort Nur Farahaina, Idris
building UMP Institutional Repository
collection Online Access
description Diabetes is a leading cause of death in the United States and leads to serious health complications. In recent decades, artificial intelligence technology and its subfield, machine learning, have been increasingly utilized to aid in disease diagnosis. Machine learning methods must be robust enough to handle the variability in diabetes datasets, which often encompass diverse patient demographics, clinical characteristics, and environmental factors. This motivates researchers to develop suitable feature selection methods that complement machine learning methods, thereby reducing time and complexity. However, feature selection may negatively impact classification accuracy by inadvertently removing essential features, or it may increase the time required due to repetitive processes during evaluation. Hence, thorough reviews of feature selection methods for diabetes classification are being conducted to evaluate their effectiveness. There are three primary categories of feature selection methods: embedded, wrapper, and filter methods. All the methods had distinct mechanisms and effects during the classification process. This study reviewed feature selection methods in each category, such as Random Forest from the embedded method, Chi-Square test from the filter method, and Recursive Feature Elimination from the wrapper method. The Chi-Square test is efficient only with categorical features, Random Forest is effective but causes high complexity and increased time due to its ensemble nature, and Recursive Feature Elimination produces the best performance but is not very suitable for data with high dimensionality. The findings indicate that Recursive Feature Elimination is more suitable for diabetes classification, as it is fast and yields good performance.
first_indexed 2025-11-15T03:59:18Z
format Article
id ump-45191
institution Universiti Malaysia Pahang
institution_category Local University
language English
last_indexed 2025-11-15T03:59:18Z
publishDate 2025
publisher Indonesian Society for Knowledge and Human Development
recordtype eprints
repository_type Digital Repository
spelling ump-451912025-07-28T04:12:39Z https://umpir.ump.edu.my/id/eprint/45191/ A review of feature selection methods on diabetes mellitus classification Nur Farahaina, Idris Mohd Arfian, Ismail Shahreen, Kasim Rohayanti, Hassan Deshinta Arrova Dewi, . Abdullah Munzir, Mohd Fauzi Rahmat, Hidayat QA75 Electronic computers. Computer science RA Public aspects of medicine Diabetes is a leading cause of death in the United States and leads to serious health complications. In recent decades, artificial intelligence technology and its subfield, machine learning, have been increasingly utilized to aid in disease diagnosis. Machine learning methods must be robust enough to handle the variability in diabetes datasets, which often encompass diverse patient demographics, clinical characteristics, and environmental factors. This motivates researchers to develop suitable feature selection methods that complement machine learning methods, thereby reducing time and complexity. However, feature selection may negatively impact classification accuracy by inadvertently removing essential features, or it may increase the time required due to repetitive processes during evaluation. Hence, thorough reviews of feature selection methods for diabetes classification are being conducted to evaluate their effectiveness. There are three primary categories of feature selection methods: embedded, wrapper, and filter methods. All the methods had distinct mechanisms and effects during the classification process. This study reviewed feature selection methods in each category, such as Random Forest from the embedded method, Chi-Square test from the filter method, and Recursive Feature Elimination from the wrapper method. The Chi-Square test is efficient only with categorical features, Random Forest is effective but causes high complexity and increased time due to its ensemble nature, and Recursive Feature Elimination produces the best performance but is not very suitable for data with high dimensionality. The findings indicate that Recursive Feature Elimination is more suitable for diabetes classification, as it is fast and yields good performance. Indonesian Society for Knowledge and Human Development 2025-01 Article PeerReviewed pdf en cc_by_sa_4 https://umpir.ump.edu.my/id/eprint/45191/1/A%20review%20of%20feature%20selection%20methods%20on%20diabetes%20mellitus%20classification.pdf Nur Farahaina, Idris and Mohd Arfian, Ismail and Shahreen, Kasim and Rohayanti, Hassan and Deshinta Arrova Dewi, . and Abdullah Munzir, Mohd Fauzi and Rahmat, Hidayat (2025) A review of feature selection methods on diabetes mellitus classification. International Journal on Advanced Science, Engineering and Information Technology, 15 (3). pp. 686 -692. ISSN 2088-5334. (Published) https://doi.org/10.18517/ijaseit.15.3.12652 https://doi.org/10.18517/ijaseit.15.3.12652 https://doi.org/10.18517/ijaseit.15.3.12652
spellingShingle QA75 Electronic computers. Computer science
RA Public aspects of medicine
Nur Farahaina, Idris
Mohd Arfian, Ismail
Shahreen, Kasim
Rohayanti, Hassan
Deshinta Arrova Dewi, .
Abdullah Munzir, Mohd Fauzi
Rahmat, Hidayat
A review of feature selection methods on diabetes mellitus classification
title A review of feature selection methods on diabetes mellitus classification
title_full A review of feature selection methods on diabetes mellitus classification
title_fullStr A review of feature selection methods on diabetes mellitus classification
title_full_unstemmed A review of feature selection methods on diabetes mellitus classification
title_short A review of feature selection methods on diabetes mellitus classification
title_sort review of feature selection methods on diabetes mellitus classification
topic QA75 Electronic computers. Computer science
RA Public aspects of medicine
url https://umpir.ump.edu.my/id/eprint/45191/
https://umpir.ump.edu.my/id/eprint/45191/
https://umpir.ump.edu.my/id/eprint/45191/