Improved robust principal component analysis based on minimum regularized covariance determinant for the detection of high leverage points in high dimensional data
This paper presents an extension work of robust principal component analysis (ROBPCA) denoted as IRPCA, to improve the accuracy of the detection of high leverage points (HLPs) in high dimensional data (HDD). The IRPCA employs the Principal Component Analysis (PCA) to reduce the dimension of the data...
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Penerbit Universiti Kebangsaan Malaysia
2025
|
| Online Access: | http://journalarticle.ukm.my/26004/ http://journalarticle.ukm.my/26004/1/SML%2017.pdf |
| Summary: | This paper presents an extension work of robust principal component analysis (ROBPCA) denoted as IRPCA, to improve the accuracy of the detection of high leverage points (HLPs) in high dimensional data (HDD). The IRPCA employs the Principal Component Analysis (PCA) to reduce the dimension of the data set and subsequently a robust location and scatter estimates of the PC scores are obtained based on the Minimum Regularized Covariance Determinant (MRCD). Instead of using robust score distance to detect HLPs as in ROBPCA; in the proposed IRPCA, we have considered using Robust Mahalanobis distance (RMD). The performance of the IRPCA is compared to the ROBPCA and the Minimum Regularized Covariance Determinant and PCA-based method (MRCD-PCA) for the identification of HLPs in HDD. The results signify that all the three methods are very successful in the detection of HLPs with no masking effect. Nonetheless, the ROBPCA suffers from serious swamping problems for less than 30% of HLPs. The proposed IRPCA and the MRCD-PCA have similar performance, having very small swamping effect. However, the MRCD-PCA algorithm is quite cumbersome and required longer computational running time. The attractive feature of the IRPCA is that it provides a simpler algorithm and it is very fast. |
|---|