ExtraImpute: a novel machine learning method for missing data imputation

Missing values are one of the common incidences that occurs in healthcare datasets. Its existence usually leads to undesirable results while conducting data analysis using machine learning methods. Recently, researchers have proposed several imputation approaches to deal with missing values in real-...

Full description

Bibliographic Details
Main Authors: Alabadla, Mustafa, Sidi, Fatimah, Ishak, Iskandar, Ibrahim, Hamidah, Affendey, Lilly Suriani, Hamdan, Hazlina
Format: Article
Published: Engineering and Technology Publishing 2022
Online Access:http://psasir.upm.edu.my/id/eprint/101446/
_version_ 1848863562060005376
author Alabadla, Mustafa
Sidi, Fatimah
Ishak, Iskandar
Ibrahim, Hamidah
Affendey, Lilly Suriani
Hamdan, Hazlina
author_facet Alabadla, Mustafa
Sidi, Fatimah
Ishak, Iskandar
Ibrahim, Hamidah
Affendey, Lilly Suriani
Hamdan, Hazlina
author_sort Alabadla, Mustafa
building UPM Institutional Repository
collection Online Access
description Missing values are one of the common incidences that occurs in healthcare datasets. Its existence usually leads to undesirable results while conducting data analysis using machine learning methods. Recently, researchers have proposed several imputation approaches to deal with missing values in real-world datasets. Moreover, data imputation assists us to build a high-performance machine learning models to discover patterns in healthcare data that provides top-notch insights for a higher quality decision-making. In this paper, we propose a new imputation approach using Extremely Randomized Trees (Extra Trees) of machine learning ensemble learning methods named (ExtraImpute) to tackle numerical missing values in healthcare context. The proposed method has the ability to impute both continuous and discrete data features. This approach imputes each missing value that exists in features by predicting its value using other observed values in the dataset. To evaluate the efficiency of our algorithm, several experiments are conducted on five different benchmark healthcare datasets and compared to other commonly used imputation methods, viz. missForest, KNNImpute, Multivariate Imputation by Chained Equations (MICE), and SoftImpute. The results were validated using Root Mean Square Error (RMSE) and Coefficient of Determination (R2) scores. From these results, it was observed that our proposed algorithm outperforms existing imputation techniques.
first_indexed 2025-11-15T13:34:53Z
format Article
id upm-101446
institution Universiti Putra Malaysia
institution_category Local University
last_indexed 2025-11-15T13:34:53Z
publishDate 2022
publisher Engineering and Technology Publishing
recordtype eprints
repository_type Digital Repository
spelling upm-1014462023-10-06T23:13:02Z http://psasir.upm.edu.my/id/eprint/101446/ ExtraImpute: a novel machine learning method for missing data imputation Alabadla, Mustafa Sidi, Fatimah Ishak, Iskandar Ibrahim, Hamidah Affendey, Lilly Suriani Hamdan, Hazlina Missing values are one of the common incidences that occurs in healthcare datasets. Its existence usually leads to undesirable results while conducting data analysis using machine learning methods. Recently, researchers have proposed several imputation approaches to deal with missing values in real-world datasets. Moreover, data imputation assists us to build a high-performance machine learning models to discover patterns in healthcare data that provides top-notch insights for a higher quality decision-making. In this paper, we propose a new imputation approach using Extremely Randomized Trees (Extra Trees) of machine learning ensemble learning methods named (ExtraImpute) to tackle numerical missing values in healthcare context. The proposed method has the ability to impute both continuous and discrete data features. This approach imputes each missing value that exists in features by predicting its value using other observed values in the dataset. To evaluate the efficiency of our algorithm, several experiments are conducted on five different benchmark healthcare datasets and compared to other commonly used imputation methods, viz. missForest, KNNImpute, Multivariate Imputation by Chained Equations (MICE), and SoftImpute. The results were validated using Root Mean Square Error (RMSE) and Coefficient of Determination (R2) scores. From these results, it was observed that our proposed algorithm outperforms existing imputation techniques. Engineering and Technology Publishing 2022 Article PeerReviewed Alabadla, Mustafa and Sidi, Fatimah and Ishak, Iskandar and Ibrahim, Hamidah and Affendey, Lilly Suriani and Hamdan, Hazlina (2022) ExtraImpute: a novel machine learning method for missing data imputation. Journal of Advances in Information Technology, 13 (5). 470 - 476. ISSN 1798-2340 http://www.jait.us/index.php?m=content&c=index&a=show&catid=221&id=1255 10.12720/jait.13.5.470-476
spellingShingle Alabadla, Mustafa
Sidi, Fatimah
Ishak, Iskandar
Ibrahim, Hamidah
Affendey, Lilly Suriani
Hamdan, Hazlina
ExtraImpute: a novel machine learning method for missing data imputation
title ExtraImpute: a novel machine learning method for missing data imputation
title_full ExtraImpute: a novel machine learning method for missing data imputation
title_fullStr ExtraImpute: a novel machine learning method for missing data imputation
title_full_unstemmed ExtraImpute: a novel machine learning method for missing data imputation
title_short ExtraImpute: a novel machine learning method for missing data imputation
title_sort extraimpute: a novel machine learning method for missing data imputation
url http://psasir.upm.edu.my/id/eprint/101446/
http://psasir.upm.edu.my/id/eprint/101446/
http://psasir.upm.edu.my/id/eprint/101446/