Robust diagnostics and parameter estimation in linear regression for high dimensional data

Several methods of identification of HLPs in HDD have been put forth, including the methods of Robust Mahalanobis Distance (RMD) based on Minimum Regularized Covariance Determinant (MRCD) and Robust Principal Component Analysis (ROBPCA). However, they suffer from masking and swamping effects when...

Full description

Bibliographic Details
Main Author: Abdul Wahab, Siti Zahariah
Format: Thesis
Language:English
Published: 2023
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/118358/
http://psasir.upm.edu.my/id/eprint/118358/1/118358.pdf
_version_ 1848867755506270208
author Abdul Wahab, Siti Zahariah
author_facet Abdul Wahab, Siti Zahariah
author_sort Abdul Wahab, Siti Zahariah
building UPM Institutional Repository
collection Online Access
description Several methods of identification of HLPs in HDD have been put forth, including the methods of Robust Mahalanobis Distance (RMD) based on Minimum Regularized Covariance Determinant (MRCD) and Robust Principal Component Analysis (ROBPCA). However, they suffer from masking and swamping effects when the predictor variables are at least 700. In addressing this problem, a modification of HLPs detection method called Robust Mahalanobis Distance based on the combination of the Minimum Regularized Covariance Determinant and Principal Component Analysis (RMD-MRCD-PCA) is proposed. Empirical evidence from simulation studies and real data show that the RMD-MRCD-PCA method is very successful in the detection HLPs with negligible masking and swamping effects. Numerous classical methods, such as leave-one-out cross-validation (LOOCV) and Kfold cross-validation (K-FoldCV) are developed to determine the optimal number of PLS components. Nonetheless, they are easily affected by HLPs. Thus, robust cross validation techniques, denoted as RMD-MRCD-PCA-LOOCV and RMD-MRCD-PCAK- FoldCV are proposed to remedy this problem. The results of the simulation study and real data sets indicate that the proposed methods successfully select the appropriate number of PLS components. The statistically inspired modification of partial least squares (SIMPLS) is the popular method to deal with multicollinearity in high dimensional data. Nonetheless, the SIMPLS method is vulnerable to the existence of HLPs. Hence, the robust weight based on RMD-MRCD-PCA of SIMPLS (RMD-MRCD-PCA-RWSIMPLS) is established to overcome this issue. Simulation experiments and real examples have demonstrated that the RMD-MRCD-PCA-RWSIMPLS is more efficient than the SIMPLS and the RWSIMPLS methods. Partial least squares discriminant analysis (PLSDA) is the popular classifier for HDD. Nevertheless, the PLSDA is easily affected by the presence of HLPs. Hence, a robust weighted partial least squares discriminant analysis based on the weighting function of RMD-MRCD-PCA (RMD-MRCD-PCA-RWPLSDA) is proposed to close the gap in the literature. The results of the simulation study and real datasets show that the RMDMRCD- PCA-RWPLSDA method successfully and efficiently classifies the data into binary and multiple groups. Hotelling T2 based on PLS (T2-PLS) method has been proposed for variable selection technique in HDD. However, the T2-PLS is not resistant to the HLPs. To rectify this issue, the robust Hotelling T2 variable selection method, which is based on the RMDMRCD- PCA-RWSIMPLS, is proposed. The results of simulation study and real datasets indicate that the T2-RMD-MRCD-PCA-RWSIMPLS method successfully selects appropriate number of important variables to be included in the model with the least value of mean square error.
first_indexed 2025-11-15T14:41:32Z
format Thesis
id upm-118358
institution Universiti Putra Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T14:41:32Z
publishDate 2023
recordtype eprints
repository_type Digital Repository
spelling upm-1183582025-08-04T06:14:13Z http://psasir.upm.edu.my/id/eprint/118358/ Robust diagnostics and parameter estimation in linear regression for high dimensional data Abdul Wahab, Siti Zahariah Several methods of identification of HLPs in HDD have been put forth, including the methods of Robust Mahalanobis Distance (RMD) based on Minimum Regularized Covariance Determinant (MRCD) and Robust Principal Component Analysis (ROBPCA). However, they suffer from masking and swamping effects when the predictor variables are at least 700. In addressing this problem, a modification of HLPs detection method called Robust Mahalanobis Distance based on the combination of the Minimum Regularized Covariance Determinant and Principal Component Analysis (RMD-MRCD-PCA) is proposed. Empirical evidence from simulation studies and real data show that the RMD-MRCD-PCA method is very successful in the detection HLPs with negligible masking and swamping effects. Numerous classical methods, such as leave-one-out cross-validation (LOOCV) and Kfold cross-validation (K-FoldCV) are developed to determine the optimal number of PLS components. Nonetheless, they are easily affected by HLPs. Thus, robust cross validation techniques, denoted as RMD-MRCD-PCA-LOOCV and RMD-MRCD-PCAK- FoldCV are proposed to remedy this problem. The results of the simulation study and real data sets indicate that the proposed methods successfully select the appropriate number of PLS components. The statistically inspired modification of partial least squares (SIMPLS) is the popular method to deal with multicollinearity in high dimensional data. Nonetheless, the SIMPLS method is vulnerable to the existence of HLPs. Hence, the robust weight based on RMD-MRCD-PCA of SIMPLS (RMD-MRCD-PCA-RWSIMPLS) is established to overcome this issue. Simulation experiments and real examples have demonstrated that the RMD-MRCD-PCA-RWSIMPLS is more efficient than the SIMPLS and the RWSIMPLS methods. Partial least squares discriminant analysis (PLSDA) is the popular classifier for HDD. Nevertheless, the PLSDA is easily affected by the presence of HLPs. Hence, a robust weighted partial least squares discriminant analysis based on the weighting function of RMD-MRCD-PCA (RMD-MRCD-PCA-RWPLSDA) is proposed to close the gap in the literature. The results of the simulation study and real datasets show that the RMDMRCD- PCA-RWPLSDA method successfully and efficiently classifies the data into binary and multiple groups. Hotelling T2 based on PLS (T2-PLS) method has been proposed for variable selection technique in HDD. However, the T2-PLS is not resistant to the HLPs. To rectify this issue, the robust Hotelling T2 variable selection method, which is based on the RMDMRCD- PCA-RWSIMPLS, is proposed. The results of simulation study and real datasets indicate that the T2-RMD-MRCD-PCA-RWSIMPLS method successfully selects appropriate number of important variables to be included in the model with the least value of mean square error. 2023 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/118358/1/118358.pdf Abdul Wahab, Siti Zahariah (2023) Robust diagnostics and parameter estimation in linear regression for high dimensional data. Doctoral thesis, Universiti Putra Malaysia. http://ethesis.upm.edu.my/id/eprint/18371 High-dimensional data Robust statistics Linear models (Statistics)
spellingShingle High-dimensional data
Robust statistics
Linear models (Statistics)
Abdul Wahab, Siti Zahariah
Robust diagnostics and parameter estimation in linear regression for high dimensional data
title Robust diagnostics and parameter estimation in linear regression for high dimensional data
title_full Robust diagnostics and parameter estimation in linear regression for high dimensional data
title_fullStr Robust diagnostics and parameter estimation in linear regression for high dimensional data
title_full_unstemmed Robust diagnostics and parameter estimation in linear regression for high dimensional data
title_short Robust diagnostics and parameter estimation in linear regression for high dimensional data
title_sort robust diagnostics and parameter estimation in linear regression for high dimensional data
topic High-dimensional data
Robust statistics
Linear models (Statistics)
url http://psasir.upm.edu.my/id/eprint/118358/
http://psasir.upm.edu.my/id/eprint/118358/
http://psasir.upm.edu.my/id/eprint/118358/1/118358.pdf