Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection

Around 1.5 million new cases of Hepatitis C Virus (HCV) are diagnosed globally each year (World Health Organization, 2023). Consequently, there is a pressing need for early diagnostic methods for HCV. This study investigates the prognostic accuracy of several ensemble machine learning (ML) models fo...

Full description

Bibliographic Details
Main Authors: Tusher, Ekramul Haque, Mohd Arfian, Ismail, Akib, Abdullah, Gabralla, Lubna A., Ashraf Osman, Ibrahim, Hafizan, Mat Som, Muhammad Akmal, Remli
Format: Article
Language:English
Published: Public Library of Science 2025
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/45086/
http://umpir.ump.edu.my/id/eprint/45086/1/Comparative%20investigation%20of%20bagging%20enhanced%20machine%20learning.pdf
_version_ 1848827252375027712
author Tusher, Ekramul Haque
Mohd Arfian, Ismail
Akib, Abdullah
Gabralla, Lubna A.
Ashraf Osman, Ibrahim
Hafizan, Mat Som
Muhammad Akmal, Remli
author_facet Tusher, Ekramul Haque
Mohd Arfian, Ismail
Akib, Abdullah
Gabralla, Lubna A.
Ashraf Osman, Ibrahim
Hafizan, Mat Som
Muhammad Akmal, Remli
author_sort Tusher, Ekramul Haque
building UMP Institutional Repository
collection Online Access
description Around 1.5 million new cases of Hepatitis C Virus (HCV) are diagnosed globally each year (World Health Organization, 2023). Consequently, there is a pressing need for early diagnostic methods for HCV. This study investigates the prognostic accuracy of several ensemble machine learning (ML) models for diagnosing HCV infection. The study utilizes a dataset comprising demographic information of 615 individuals suspected of having HCV infection. Additionally, the research employs oversampling and undersampling techniques to address class imbalances in the dataset and conducts feature reduction using the F-test in one-way analysis of variance. Ensemble ML methods, including Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Logistic Regression (LR), Random Forest (RF), Naïve Bayes (NB), and Decision Tree (DT), are used to predict HCV infection. The performance of these ensemble methods is evaluated using metrics such as accuracy, recall, precision, F1 score, G-mean, balanced accuracy, cross-validation (CV), area under the curve (AUC), standard deviation, and error rate. Compared with previous studies, the Bagging k-NN model demonstrated superior performance under oversampling conditions, achieving 98.37% accuracy, 98.23% CV score, 97.67% precision, 97.93% recall, 98.18% selectivity, 97.79% F1 score, 98.06% balanced accuracy, 98.05% G-mean, a 1.63% error rate, 0.98 AUC, and a standard deviation of 0.192. This study highlights the potential of ensemble ML approaches in improving the diagnosis of HCV. The findings provide a foundation for developing accurate predictive methods for HCV diagnosis.
first_indexed 2025-11-15T03:57:45Z
format Article
id ump-45086
institution Universiti Malaysia Pahang
institution_category Local University
language English
last_indexed 2025-11-15T03:57:45Z
publishDate 2025
publisher Public Library of Science
recordtype eprints
repository_type Digital Repository
spelling ump-450862025-07-15T03:20:35Z http://umpir.ump.edu.my/id/eprint/45086/ Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection Tusher, Ekramul Haque Mohd Arfian, Ismail Akib, Abdullah Gabralla, Lubna A. Ashraf Osman, Ibrahim Hafizan, Mat Som Muhammad Akmal, Remli QA75 Electronic computers. Computer science QA76 Computer software RA Public aspects of medicine Around 1.5 million new cases of Hepatitis C Virus (HCV) are diagnosed globally each year (World Health Organization, 2023). Consequently, there is a pressing need for early diagnostic methods for HCV. This study investigates the prognostic accuracy of several ensemble machine learning (ML) models for diagnosing HCV infection. The study utilizes a dataset comprising demographic information of 615 individuals suspected of having HCV infection. Additionally, the research employs oversampling and undersampling techniques to address class imbalances in the dataset and conducts feature reduction using the F-test in one-way analysis of variance. Ensemble ML methods, including Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Logistic Regression (LR), Random Forest (RF), Naïve Bayes (NB), and Decision Tree (DT), are used to predict HCV infection. The performance of these ensemble methods is evaluated using metrics such as accuracy, recall, precision, F1 score, G-mean, balanced accuracy, cross-validation (CV), area under the curve (AUC), standard deviation, and error rate. Compared with previous studies, the Bagging k-NN model demonstrated superior performance under oversampling conditions, achieving 98.37% accuracy, 98.23% CV score, 97.67% precision, 97.93% recall, 98.18% selectivity, 97.79% F1 score, 98.06% balanced accuracy, 98.05% G-mean, a 1.63% error rate, 0.98 AUC, and a standard deviation of 0.192. This study highlights the potential of ensemble ML approaches in improving the diagnosis of HCV. The findings provide a foundation for developing accurate predictive methods for HCV diagnosis. Public Library of Science 2025-06-26 Article PeerReviewed pdf en cc_by_4 http://umpir.ump.edu.my/id/eprint/45086/1/Comparative%20investigation%20of%20bagging%20enhanced%20machine%20learning.pdf Tusher, Ekramul Haque and Mohd Arfian, Ismail and Akib, Abdullah and Gabralla, Lubna A. and Ashraf Osman, Ibrahim and Hafizan, Mat Som and Muhammad Akmal, Remli (2025) Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection. PLoS ONE, 20 (e0326488). pp. 1-44. ISSN 1932-6203. (Published) https://doi.org/10.1371/journal.pone.0326488 https://doi.org/10.1371/journal.pone.0326488
spellingShingle QA75 Electronic computers. Computer science
QA76 Computer software
RA Public aspects of medicine
Tusher, Ekramul Haque
Mohd Arfian, Ismail
Akib, Abdullah
Gabralla, Lubna A.
Ashraf Osman, Ibrahim
Hafizan, Mat Som
Muhammad Akmal, Remli
Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection
title Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection
title_full Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection
title_fullStr Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection
title_full_unstemmed Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection
title_short Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection
title_sort comparative investigation of bagging enhanced machine learning for early detection of hcv infections using class imbalance technique with feature selection
topic QA75 Electronic computers. Computer science
QA76 Computer software
RA Public aspects of medicine
url http://umpir.ump.edu.my/id/eprint/45086/
http://umpir.ump.edu.my/id/eprint/45086/
http://umpir.ump.edu.my/id/eprint/45086/
http://umpir.ump.edu.my/id/eprint/45086/1/Comparative%20investigation%20of%20bagging%20enhanced%20machine%20learning.pdf