Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia

In this study, the ability of numerous statistical and machine learning models to impute water quality data was investigated at three monitoring stations along the Langat River in Malaysia. Inconsistencies in the percentage of missing data between monitoring stations (varying from 20 percent (modera...

Full description

Bibliographic Details
Main Authors: Naeimah Mamat, Siti Fatin Mohd Razali
Format: Article
Language:English
Published: Penerbit Universiti Kebangsaan Malaysia 2023
Online Access:http://journalarticle.ukm.my/21963/
http://journalarticle.ukm.my/21963/1/kjt_18.pdf
_version_ 1848815482040221696
author Naeimah Mamat,
Siti Fatin Mohd Razali,
author_facet Naeimah Mamat,
Siti Fatin Mohd Razali,
author_sort Naeimah Mamat,
building UKM Institutional Repository
collection Online Access
description In this study, the ability of numerous statistical and machine learning models to impute water quality data was investigated at three monitoring stations along the Langat River in Malaysia. Inconsistencies in the percentage of missing data between monitoring stations (varying from 20 percent (moderate) to over 50 percent (high)) represent the greatest obstacle of the study. The main objective was to select the best method for imputation and compare whether there are differences between the methods used by the different stations. The paper focuses on different imputation methods such as Multiple Predictive Mean Matching (PMM), Multiple Random Forest Imputation (RF), Multiple Bayesian Linear Regression Imputation (BLR), Multiple Linear Regression (non-Bayesian) Imputation (LRNB), Multiple Classification and Regression Tree (CART), k-nearest neighbours (kNN) and Bootstrap-based Expectation Maximisation (EMB). Remarkably, among all seven imputation techniques, the kNN produces identically reliable results. The imputed data is all rated as ‘very good’ (NSE > 0.75). This was confirmed by the calculation of |PBIAS|<5.30 (all imputed data are‘very good’) and KGE≥0.87 (all imputations are rated as’ good’). Imputation performance improves for all three monitoring stations with an index of agreement, WI ≥ 0.94, despite varying percentages of missing data. According to the findings, the kNN imputation approach outperforms the others and should be prioritised in actual use. Future research with the existing methods could benefit from the addition of geographical data.
first_indexed 2025-11-15T00:50:40Z
format Article
id oai:generic.eprints.org:21963
institution Universiti Kebangasaan Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T00:50:40Z
publishDate 2023
publisher Penerbit Universiti Kebangsaan Malaysia
recordtype eprints
repository_type Digital Repository
spelling oai:generic.eprints.org:219632023-07-27T05:58:45Z http://journalarticle.ukm.my/21963/ Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia Naeimah Mamat, Siti Fatin Mohd Razali, In this study, the ability of numerous statistical and machine learning models to impute water quality data was investigated at three monitoring stations along the Langat River in Malaysia. Inconsistencies in the percentage of missing data between monitoring stations (varying from 20 percent (moderate) to over 50 percent (high)) represent the greatest obstacle of the study. The main objective was to select the best method for imputation and compare whether there are differences between the methods used by the different stations. The paper focuses on different imputation methods such as Multiple Predictive Mean Matching (PMM), Multiple Random Forest Imputation (RF), Multiple Bayesian Linear Regression Imputation (BLR), Multiple Linear Regression (non-Bayesian) Imputation (LRNB), Multiple Classification and Regression Tree (CART), k-nearest neighbours (kNN) and Bootstrap-based Expectation Maximisation (EMB). Remarkably, among all seven imputation techniques, the kNN produces identically reliable results. The imputed data is all rated as ‘very good’ (NSE > 0.75). This was confirmed by the calculation of |PBIAS|<5.30 (all imputed data are‘very good’) and KGE≥0.87 (all imputations are rated as’ good’). Imputation performance improves for all three monitoring stations with an index of agreement, WI ≥ 0.94, despite varying percentages of missing data. According to the findings, the kNN imputation approach outperforms the others and should be prioritised in actual use. Future research with the existing methods could benefit from the addition of geographical data. Penerbit Universiti Kebangsaan Malaysia 2023 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/21963/1/kjt_18.pdf Naeimah Mamat, and Siti Fatin Mohd Razali, (2023) Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia. Jurnal Kejuruteraan, 35 (1). pp. 191-201. ISSN 0128-0198 https://www.ukm.my/jkukm/volume-3501-2023/
spellingShingle Naeimah Mamat,
Siti Fatin Mohd Razali,
Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia
title Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia
title_full Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia
title_fullStr Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia
title_full_unstemmed Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia
title_short Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia
title_sort comparisons of various imputation methods for incomplete water quality data: a case study of the langat river, malaysia
url http://journalarticle.ukm.my/21963/
http://journalarticle.ukm.my/21963/
http://journalarticle.ukm.my/21963/1/kjt_18.pdf