A review of feature selection methods on diabetes mellitus classification
Diabetes is a leading cause of death in the United States and leads to serious health complications. In recent decades, artificial intelligence technology and its subfield, machine learning, have been increasingly utilized to aid in disease diagnosis. Machine learning methods must be robust enough t...
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Indonesian Society for Knowledge and Human Development
2025
|
| Subjects: | |
| Online Access: | https://umpir.ump.edu.my/id/eprint/45191/ |
| _version_ | 1848827349573828608 |
|---|---|
| author | Nur Farahaina, Idris Mohd Arfian, Ismail Shahreen, Kasim Rohayanti, Hassan Deshinta Arrova Dewi, . Abdullah Munzir, Mohd Fauzi Rahmat, Hidayat |
| author_facet | Nur Farahaina, Idris Mohd Arfian, Ismail Shahreen, Kasim Rohayanti, Hassan Deshinta Arrova Dewi, . Abdullah Munzir, Mohd Fauzi Rahmat, Hidayat |
| author_sort | Nur Farahaina, Idris |
| building | UMP Institutional Repository |
| collection | Online Access |
| description | Diabetes is a leading cause of death in the United States and leads to serious health complications. In recent decades, artificial intelligence technology and its subfield, machine learning, have been increasingly utilized to aid in disease diagnosis. Machine learning methods must be robust enough to handle the variability in diabetes datasets, which often encompass diverse patient demographics, clinical characteristics, and environmental factors. This motivates researchers to develop suitable feature selection methods that complement machine learning methods, thereby reducing time and complexity. However, feature selection may negatively impact classification accuracy by inadvertently removing essential features, or it may increase the time required due to repetitive processes during evaluation. Hence, thorough reviews of feature selection methods for diabetes classification are being conducted to evaluate their effectiveness. There are three primary categories of feature selection methods: embedded, wrapper, and filter methods. All the methods had distinct mechanisms and effects during the classification process. This study reviewed feature selection methods in each category, such as Random Forest from the embedded method, Chi-Square test from the filter method, and Recursive Feature Elimination from the wrapper method. The Chi-Square test is efficient only with categorical features, Random Forest is effective but causes high complexity and increased time due to its ensemble nature, and Recursive Feature Elimination produces the best performance but is not very suitable for data with high dimensionality. The findings indicate that Recursive Feature Elimination is more suitable for diabetes classification, as it is fast and yields good performance. |
| first_indexed | 2025-11-15T03:59:18Z |
| format | Article |
| id | ump-45191 |
| institution | Universiti Malaysia Pahang |
| institution_category | Local University |
| language | English |
| last_indexed | 2025-11-15T03:59:18Z |
| publishDate | 2025 |
| publisher | Indonesian Society for Knowledge and Human Development |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | ump-451912025-07-28T04:12:39Z https://umpir.ump.edu.my/id/eprint/45191/ A review of feature selection methods on diabetes mellitus classification Nur Farahaina, Idris Mohd Arfian, Ismail Shahreen, Kasim Rohayanti, Hassan Deshinta Arrova Dewi, . Abdullah Munzir, Mohd Fauzi Rahmat, Hidayat QA75 Electronic computers. Computer science RA Public aspects of medicine Diabetes is a leading cause of death in the United States and leads to serious health complications. In recent decades, artificial intelligence technology and its subfield, machine learning, have been increasingly utilized to aid in disease diagnosis. Machine learning methods must be robust enough to handle the variability in diabetes datasets, which often encompass diverse patient demographics, clinical characteristics, and environmental factors. This motivates researchers to develop suitable feature selection methods that complement machine learning methods, thereby reducing time and complexity. However, feature selection may negatively impact classification accuracy by inadvertently removing essential features, or it may increase the time required due to repetitive processes during evaluation. Hence, thorough reviews of feature selection methods for diabetes classification are being conducted to evaluate their effectiveness. There are three primary categories of feature selection methods: embedded, wrapper, and filter methods. All the methods had distinct mechanisms and effects during the classification process. This study reviewed feature selection methods in each category, such as Random Forest from the embedded method, Chi-Square test from the filter method, and Recursive Feature Elimination from the wrapper method. The Chi-Square test is efficient only with categorical features, Random Forest is effective but causes high complexity and increased time due to its ensemble nature, and Recursive Feature Elimination produces the best performance but is not very suitable for data with high dimensionality. The findings indicate that Recursive Feature Elimination is more suitable for diabetes classification, as it is fast and yields good performance. Indonesian Society for Knowledge and Human Development 2025-01 Article PeerReviewed pdf en cc_by_sa_4 https://umpir.ump.edu.my/id/eprint/45191/1/A%20review%20of%20feature%20selection%20methods%20on%20diabetes%20mellitus%20classification.pdf Nur Farahaina, Idris and Mohd Arfian, Ismail and Shahreen, Kasim and Rohayanti, Hassan and Deshinta Arrova Dewi, . and Abdullah Munzir, Mohd Fauzi and Rahmat, Hidayat (2025) A review of feature selection methods on diabetes mellitus classification. International Journal on Advanced Science, Engineering and Information Technology, 15 (3). pp. 686 -692. ISSN 2088-5334. (Published) https://doi.org/10.18517/ijaseit.15.3.12652 https://doi.org/10.18517/ijaseit.15.3.12652 https://doi.org/10.18517/ijaseit.15.3.12652 |
| spellingShingle | QA75 Electronic computers. Computer science RA Public aspects of medicine Nur Farahaina, Idris Mohd Arfian, Ismail Shahreen, Kasim Rohayanti, Hassan Deshinta Arrova Dewi, . Abdullah Munzir, Mohd Fauzi Rahmat, Hidayat A review of feature selection methods on diabetes mellitus classification |
| title | A review of feature selection methods on diabetes mellitus classification |
| title_full | A review of feature selection methods on diabetes mellitus classification |
| title_fullStr | A review of feature selection methods on diabetes mellitus classification |
| title_full_unstemmed | A review of feature selection methods on diabetes mellitus classification |
| title_short | A review of feature selection methods on diabetes mellitus classification |
| title_sort | review of feature selection methods on diabetes mellitus classification |
| topic | QA75 Electronic computers. Computer science RA Public aspects of medicine |
| url | https://umpir.ump.edu.my/id/eprint/45191/ https://umpir.ump.edu.my/id/eprint/45191/ https://umpir.ump.edu.my/id/eprint/45191/ |