Robust diagnostics and parameter estimation in linear regression for high dimensional data
Several methods of identification of HLPs in HDD have been put forth, including the methods of Robust Mahalanobis Distance (RMD) based on Minimum Regularized Covariance Determinant (MRCD) and Robust Principal Component Analysis (ROBPCA). However, they suffer from masking and swamping effects when...
| Main Author: | |
|---|---|
| Format: | Thesis |
| Language: | English |
| Published: |
2023
|
| Subjects: | |
| Online Access: | http://psasir.upm.edu.my/id/eprint/118358/ http://psasir.upm.edu.my/id/eprint/118358/1/118358.pdf |
| _version_ | 1848867755506270208 |
|---|---|
| author | Abdul Wahab, Siti Zahariah |
| author_facet | Abdul Wahab, Siti Zahariah |
| author_sort | Abdul Wahab, Siti Zahariah |
| building | UPM Institutional Repository |
| collection | Online Access |
| description | Several methods of identification of HLPs in HDD have been put forth, including the
methods of Robust Mahalanobis Distance (RMD) based on Minimum Regularized
Covariance Determinant (MRCD) and Robust Principal Component Analysis
(ROBPCA). However, they suffer from masking and swamping effects when the
predictor variables are at least 700. In addressing this problem, a modification of HLPs
detection method called Robust Mahalanobis Distance based on the combination of the
Minimum Regularized Covariance Determinant and Principal Component Analysis
(RMD-MRCD-PCA) is proposed. Empirical evidence from simulation studies and real
data show that the RMD-MRCD-PCA method is very successful in the detection HLPs
with negligible masking and swamping effects.
Numerous classical methods, such as leave-one-out cross-validation (LOOCV) and Kfold
cross-validation (K-FoldCV) are developed to determine the optimal number of PLS
components. Nonetheless, they are easily affected by HLPs. Thus, robust cross
validation techniques, denoted as RMD-MRCD-PCA-LOOCV and RMD-MRCD-PCAK-
FoldCV are proposed to remedy this problem. The results of the simulation study and
real data sets indicate that the proposed methods successfully select the appropriate
number of PLS components.
The statistically inspired modification of partial least squares (SIMPLS) is the popular
method to deal with multicollinearity in high dimensional data. Nonetheless, the
SIMPLS method is vulnerable to the existence of HLPs. Hence, the robust weight based
on RMD-MRCD-PCA of SIMPLS (RMD-MRCD-PCA-RWSIMPLS) is established to
overcome this issue. Simulation experiments and real examples have demonstrated that
the RMD-MRCD-PCA-RWSIMPLS is more efficient than the SIMPLS and the
RWSIMPLS methods.
Partial least squares discriminant analysis (PLSDA) is the popular classifier for HDD.
Nevertheless, the PLSDA is easily affected by the presence of HLPs. Hence, a robust
weighted partial least squares discriminant analysis based on the weighting function of
RMD-MRCD-PCA (RMD-MRCD-PCA-RWPLSDA) is proposed to close the gap in the
literature. The results of the simulation study and real datasets show that the RMDMRCD-
PCA-RWPLSDA method successfully and efficiently classifies the data into
binary and multiple groups.
Hotelling T2 based on PLS (T2-PLS) method has been proposed for variable selection
technique in HDD. However, the T2-PLS is not resistant to the HLPs. To rectify this
issue, the robust Hotelling T2 variable selection method, which is based on the RMDMRCD-
PCA-RWSIMPLS, is proposed. The results of simulation study and real datasets
indicate that the T2-RMD-MRCD-PCA-RWSIMPLS method successfully selects
appropriate number of important variables to be included in the model with the least
value of mean square error. |
| first_indexed | 2025-11-15T14:41:32Z |
| format | Thesis |
| id | upm-118358 |
| institution | Universiti Putra Malaysia |
| institution_category | Local University |
| language | English |
| last_indexed | 2025-11-15T14:41:32Z |
| publishDate | 2023 |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | upm-1183582025-08-04T06:14:13Z http://psasir.upm.edu.my/id/eprint/118358/ Robust diagnostics and parameter estimation in linear regression for high dimensional data Abdul Wahab, Siti Zahariah Several methods of identification of HLPs in HDD have been put forth, including the methods of Robust Mahalanobis Distance (RMD) based on Minimum Regularized Covariance Determinant (MRCD) and Robust Principal Component Analysis (ROBPCA). However, they suffer from masking and swamping effects when the predictor variables are at least 700. In addressing this problem, a modification of HLPs detection method called Robust Mahalanobis Distance based on the combination of the Minimum Regularized Covariance Determinant and Principal Component Analysis (RMD-MRCD-PCA) is proposed. Empirical evidence from simulation studies and real data show that the RMD-MRCD-PCA method is very successful in the detection HLPs with negligible masking and swamping effects. Numerous classical methods, such as leave-one-out cross-validation (LOOCV) and Kfold cross-validation (K-FoldCV) are developed to determine the optimal number of PLS components. Nonetheless, they are easily affected by HLPs. Thus, robust cross validation techniques, denoted as RMD-MRCD-PCA-LOOCV and RMD-MRCD-PCAK- FoldCV are proposed to remedy this problem. The results of the simulation study and real data sets indicate that the proposed methods successfully select the appropriate number of PLS components. The statistically inspired modification of partial least squares (SIMPLS) is the popular method to deal with multicollinearity in high dimensional data. Nonetheless, the SIMPLS method is vulnerable to the existence of HLPs. Hence, the robust weight based on RMD-MRCD-PCA of SIMPLS (RMD-MRCD-PCA-RWSIMPLS) is established to overcome this issue. Simulation experiments and real examples have demonstrated that the RMD-MRCD-PCA-RWSIMPLS is more efficient than the SIMPLS and the RWSIMPLS methods. Partial least squares discriminant analysis (PLSDA) is the popular classifier for HDD. Nevertheless, the PLSDA is easily affected by the presence of HLPs. Hence, a robust weighted partial least squares discriminant analysis based on the weighting function of RMD-MRCD-PCA (RMD-MRCD-PCA-RWPLSDA) is proposed to close the gap in the literature. The results of the simulation study and real datasets show that the RMDMRCD- PCA-RWPLSDA method successfully and efficiently classifies the data into binary and multiple groups. Hotelling T2 based on PLS (T2-PLS) method has been proposed for variable selection technique in HDD. However, the T2-PLS is not resistant to the HLPs. To rectify this issue, the robust Hotelling T2 variable selection method, which is based on the RMDMRCD- PCA-RWSIMPLS, is proposed. The results of simulation study and real datasets indicate that the T2-RMD-MRCD-PCA-RWSIMPLS method successfully selects appropriate number of important variables to be included in the model with the least value of mean square error. 2023 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/118358/1/118358.pdf Abdul Wahab, Siti Zahariah (2023) Robust diagnostics and parameter estimation in linear regression for high dimensional data. Doctoral thesis, Universiti Putra Malaysia. http://ethesis.upm.edu.my/id/eprint/18371 High-dimensional data Robust statistics Linear models (Statistics) |
| spellingShingle | High-dimensional data Robust statistics Linear models (Statistics) Abdul Wahab, Siti Zahariah Robust diagnostics and parameter estimation in linear regression for high dimensional data |
| title | Robust diagnostics and parameter estimation in linear regression for high dimensional data |
| title_full | Robust diagnostics and parameter estimation in linear regression for high dimensional data |
| title_fullStr | Robust diagnostics and parameter estimation in linear regression for high dimensional data |
| title_full_unstemmed | Robust diagnostics and parameter estimation in linear regression for high dimensional data |
| title_short | Robust diagnostics and parameter estimation in linear regression for high dimensional data |
| title_sort | robust diagnostics and parameter estimation in linear regression for high dimensional data |
| topic | High-dimensional data Robust statistics Linear models (Statistics) |
| url | http://psasir.upm.edu.my/id/eprint/118358/ http://psasir.upm.edu.my/id/eprint/118358/ http://psasir.upm.edu.my/id/eprint/118358/1/118358.pdf |