A comparative analysis of machine learning algorithms for diabetes prediction

Diabetes mellitus is a chronic metabolic disorder with significant global health implications. The accurate prediction and detection of diabetes using artificial intelligence are crucial for preventing complications and improving patient outcomes. This study focuses on comparing the performance of t...

Full description

Bibliographic Details
Main Authors: Alansari, Waseem Abdulmahdi, Masnizah Mohd
Format: Article
Language:English
Published: Penerbit Universiti Kebangsaan Malaysia 2024
Online Access:http://journalarticle.ukm.my/25041/
http://journalarticle.ukm.my/25041/1/253%20%E2%80%93%20265.pdf
_version_ 1848816252361900032
author Alansari, Waseem Abdulmahdi
Masnizah Mohd,
author_facet Alansari, Waseem Abdulmahdi
Masnizah Mohd,
author_sort Alansari, Waseem Abdulmahdi
building UKM Institutional Repository
collection Online Access
description Diabetes mellitus is a chronic metabolic disorder with significant global health implications. The accurate prediction and detection of diabetes using artificial intelligence are crucial for preventing complications and improving patient outcomes. This study focuses on comparing the performance of three machine learning algorithms, namely Naive Bayes (NB), Support Vector Machines (SVM), and Random Forest (RF), in predicting diabetes using two datasets: Pima Indians Diabetes Dataset (PIDD) and the Diabetes 2019 Dataset (DD2019), and the need to identify the most accurate and effective algorithm for diabetes prediction. Nine features which are Age, Blood pressure, Skin thickness, Glucose, Diabetes pedigree function, Pregnancy, BMI, Insulin level, and Outcome been used for the prediction of diabetes. The methodology involves data collection, pre-processing, and training the algorithms using k-fold cross-validation. The results indicate that pre-processing steps and dataset characteristics significantly impact algorithm performance. We discovered that the model with RF consistently achieves the highest accuracy. As per the findings, the RF algorithm attained the maximum accuracy of 77% in the context of PIDD. During the DD2019 experiment, the RF and SVM algorithms demonstrated the highest levels of accuracy, achieving 96.65% and 93.93%, respectively. The study contributes insights into the importance of pre-processing and feature selection in improving algorithm performance. The findings have implications for developing accurate predictive models and improving diabetes detection.
first_indexed 2025-11-15T01:02:55Z
format Article
id oai:generic.eprints.org:25041
institution Universiti Kebangasaan Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T01:02:55Z
publishDate 2024
publisher Penerbit Universiti Kebangsaan Malaysia
recordtype eprints
repository_type Digital Repository
spelling oai:generic.eprints.org:250412025-04-08T04:32:53Z http://journalarticle.ukm.my/25041/ A comparative analysis of machine learning algorithms for diabetes prediction Alansari, Waseem Abdulmahdi Masnizah Mohd, Diabetes mellitus is a chronic metabolic disorder with significant global health implications. The accurate prediction and detection of diabetes using artificial intelligence are crucial for preventing complications and improving patient outcomes. This study focuses on comparing the performance of three machine learning algorithms, namely Naive Bayes (NB), Support Vector Machines (SVM), and Random Forest (RF), in predicting diabetes using two datasets: Pima Indians Diabetes Dataset (PIDD) and the Diabetes 2019 Dataset (DD2019), and the need to identify the most accurate and effective algorithm for diabetes prediction. Nine features which are Age, Blood pressure, Skin thickness, Glucose, Diabetes pedigree function, Pregnancy, BMI, Insulin level, and Outcome been used for the prediction of diabetes. The methodology involves data collection, pre-processing, and training the algorithms using k-fold cross-validation. The results indicate that pre-processing steps and dataset characteristics significantly impact algorithm performance. We discovered that the model with RF consistently achieves the highest accuracy. As per the findings, the RF algorithm attained the maximum accuracy of 77% in the context of PIDD. During the DD2019 experiment, the RF and SVM algorithms demonstrated the highest levels of accuracy, achieving 96.65% and 93.93%, respectively. The study contributes insights into the importance of pre-processing and feature selection in improving algorithm performance. The findings have implications for developing accurate predictive models and improving diabetes detection. Penerbit Universiti Kebangsaan Malaysia 2024-10-13 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/25041/1/253%20%E2%80%93%20265.pdf Alansari, Waseem Abdulmahdi and Masnizah Mohd, (2024) A comparative analysis of machine learning algorithms for diabetes prediction. Asia-Pacific Journal of Information Technology and Multimedia, 13 (2). pp. 253-265. ISSN 2289-2192 https://www.ukm.my/apjitm/
spellingShingle Alansari, Waseem Abdulmahdi
Masnizah Mohd,
A comparative analysis of machine learning algorithms for diabetes prediction
title A comparative analysis of machine learning algorithms for diabetes prediction
title_full A comparative analysis of machine learning algorithms for diabetes prediction
title_fullStr A comparative analysis of machine learning algorithms for diabetes prediction
title_full_unstemmed A comparative analysis of machine learning algorithms for diabetes prediction
title_short A comparative analysis of machine learning algorithms for diabetes prediction
title_sort comparative analysis of machine learning algorithms for diabetes prediction
url http://journalarticle.ukm.my/25041/
http://journalarticle.ukm.my/25041/
http://journalarticle.ukm.my/25041/1/253%20%E2%80%93%20265.pdf