Improved robust principal component analysis based on minimum regularized covariance determinant for the detection of high leverage points in high dimensional data (penambahbaikan analisis komponen utama berdasarkan penentu kovarian teratur minimum bagi mengecam titik tuasan tinggi untuk data dimensi tinggi)

This paper presents an extension work of robust principal component analysis (ROBPCA) denoted as IRPCA, to improve the accuracy of the detection of high leverage points (HLPs) in high dimensional data (HDD). The IRPCA employs the Principal Component Analysis (PCA) to reduce the dimension of the data...

Full description

Bibliographic Details
Main Authors: Midi, Habshah, Suhaiza, Jaaz, Mohd Aslam, ., Hani Syahida, ., Emi Amielda, .
Format: Article
Language:English
Published: Penerbit Universiti Kebangsaan Malaysia 2025
Online Access:http://psasir.upm.edu.my/id/eprint/120869/
http://psasir.upm.edu.my/id/eprint/120869/1/120869.pdf
_version_ 1848868233785901056
author Midi, Habshah
Suhaiza, Jaaz
Mohd Aslam, .
Hani Syahida, .
Emi Amielda, .
author_facet Midi, Habshah
Suhaiza, Jaaz
Mohd Aslam, .
Hani Syahida, .
Emi Amielda, .
author_sort Midi, Habshah
building UPM Institutional Repository
collection Online Access
description This paper presents an extension work of robust principal component analysis (ROBPCA) denoted as IRPCA, to improve the accuracy of the detection of high leverage points (HLPs) in high dimensional data (HDD). The IRPCA employs the Principal Component Analysis (PCA) to reduce the dimension of the data set and subsequently a robust location and scatter estimates of the PC scores are obtained based on the Minimum Regularized Covariance Determinant (MRCD). Instead of using robust score distance to detect HLPs as in ROBPCA; in the proposed IRPCA, we have considered using Robust Mahalanobis distance (RMD). The performance of the IRPCA is compared to the ROBPCA and the Minimum Regularized Covariance Determinant and PCA-based method (MRCD-PCA) for the identification of HLPs in HDD. The results signify that all the three methods are very successful in the detection of HLPs with no masking effect. Nonetheless, the ROBPCA suffers from serious swamping problems for less than 30% of HLPs. The proposed IRPCA and the MRCD-PCA have similar performance, having very small swamping effect. However, the MRCD-PCA algorithm is quite cumbersome and required longer computational running time. The attractive feature of the IRPCA is that it provides a simpler algorithm and it is very fast.
first_indexed 2025-11-15T14:49:08Z
format Article
id upm-120869
institution Universiti Putra Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T14:49:08Z
publishDate 2025
publisher Penerbit Universiti Kebangsaan Malaysia
recordtype eprints
repository_type Digital Repository
spelling upm-1208692025-10-14T04:09:13Z http://psasir.upm.edu.my/id/eprint/120869/ Improved robust principal component analysis based on minimum regularized covariance determinant for the detection of high leverage points in high dimensional data (penambahbaikan analisis komponen utama berdasarkan penentu kovarian teratur minimum bagi mengecam titik tuasan tinggi untuk data dimensi tinggi) Midi, Habshah Suhaiza, Jaaz Mohd Aslam, . Hani Syahida, . Emi Amielda, . This paper presents an extension work of robust principal component analysis (ROBPCA) denoted as IRPCA, to improve the accuracy of the detection of high leverage points (HLPs) in high dimensional data (HDD). The IRPCA employs the Principal Component Analysis (PCA) to reduce the dimension of the data set and subsequently a robust location and scatter estimates of the PC scores are obtained based on the Minimum Regularized Covariance Determinant (MRCD). Instead of using robust score distance to detect HLPs as in ROBPCA; in the proposed IRPCA, we have considered using Robust Mahalanobis distance (RMD). The performance of the IRPCA is compared to the ROBPCA and the Minimum Regularized Covariance Determinant and PCA-based method (MRCD-PCA) for the identification of HLPs in HDD. The results signify that all the three methods are very successful in the detection of HLPs with no masking effect. Nonetheless, the ROBPCA suffers from serious swamping problems for less than 30% of HLPs. The proposed IRPCA and the MRCD-PCA have similar performance, having very small swamping effect. However, the MRCD-PCA algorithm is quite cumbersome and required longer computational running time. The attractive feature of the IRPCA is that it provides a simpler algorithm and it is very fast. Penerbit Universiti Kebangsaan Malaysia 2025 Article PeerReviewed text en http://psasir.upm.edu.my/id/eprint/120869/1/120869.pdf Midi, Habshah and Suhaiza, Jaaz and Mohd Aslam, . and Hani Syahida, . and Emi Amielda, . (2025) Improved robust principal component analysis based on minimum regularized covariance determinant for the detection of high leverage points in high dimensional data (penambahbaikan analisis komponen utama berdasarkan penentu kovarian teratur minimum bagi mengecam titik tuasan tinggi untuk data dimensi tinggi). Sains Malaysiana, 54 (8). pp. 2087-2097. ISSN 0126-6039; eISSN: 2735-0118 https://www.ukm.my/jsm/pdf_files/SM-PDF-54-8-2025/17.pdf 10.17576/jsm-2025-5408-17
spellingShingle Midi, Habshah
Suhaiza, Jaaz
Mohd Aslam, .
Hani Syahida, .
Emi Amielda, .
Improved robust principal component analysis based on minimum regularized covariance determinant for the detection of high leverage points in high dimensional data (penambahbaikan analisis komponen utama berdasarkan penentu kovarian teratur minimum bagi mengecam titik tuasan tinggi untuk data dimensi tinggi)
title Improved robust principal component analysis based on minimum regularized covariance determinant for the detection of high leverage points in high dimensional data (penambahbaikan analisis komponen utama berdasarkan penentu kovarian teratur minimum bagi mengecam titik tuasan tinggi untuk data dimensi tinggi)
title_full Improved robust principal component analysis based on minimum regularized covariance determinant for the detection of high leverage points in high dimensional data (penambahbaikan analisis komponen utama berdasarkan penentu kovarian teratur minimum bagi mengecam titik tuasan tinggi untuk data dimensi tinggi)
title_fullStr Improved robust principal component analysis based on minimum regularized covariance determinant for the detection of high leverage points in high dimensional data (penambahbaikan analisis komponen utama berdasarkan penentu kovarian teratur minimum bagi mengecam titik tuasan tinggi untuk data dimensi tinggi)
title_full_unstemmed Improved robust principal component analysis based on minimum regularized covariance determinant for the detection of high leverage points in high dimensional data (penambahbaikan analisis komponen utama berdasarkan penentu kovarian teratur minimum bagi mengecam titik tuasan tinggi untuk data dimensi tinggi)
title_short Improved robust principal component analysis based on minimum regularized covariance determinant for the detection of high leverage points in high dimensional data (penambahbaikan analisis komponen utama berdasarkan penentu kovarian teratur minimum bagi mengecam titik tuasan tinggi untuk data dimensi tinggi)
title_sort improved robust principal component analysis based on minimum regularized covariance determinant for the detection of high leverage points in high dimensional data (penambahbaikan analisis komponen utama berdasarkan penentu kovarian teratur minimum bagi mengecam titik tuasan tinggi untuk data dimensi tinggi)
url http://psasir.upm.edu.my/id/eprint/120869/
http://psasir.upm.edu.my/id/eprint/120869/
http://psasir.upm.edu.my/id/eprint/120869/
http://psasir.upm.edu.my/id/eprint/120869/1/120869.pdf