Two steps robust Mahanalobis method for identification of multiple high leverage points

Problem statement: High leverage points are extreme outliers in the X-direction. In regression analysis, the detection of these leverage points becomes important due to their arbitrary large effects on the estimations as well as multicollinearity problems. Mahalanobis Distance (MD) has been used as...

Full description

Bibliographic Details
Main Authors: Bagheri, Arezoo, Midi, Habshah, Rahmatullah Imon, A.H.M
Format: Article
Language:English
English
Published: Science Publications 2009
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/17502/
http://psasir.upm.edu.my/id/eprint/17502/1/Two%20steps%20robust%20Mahanalobis%20method%20for%20identification%20of%20multiple%20high%20leverage%20points.pdf
_version_ 1848843259100528640
author Bagheri, Arezoo
Midi, Habshah
Rahmatullah Imon, A.H.M
author_facet Bagheri, Arezoo
Midi, Habshah
Rahmatullah Imon, A.H.M
author_sort Bagheri, Arezoo
building UPM Institutional Repository
collection Online Access
description Problem statement: High leverage points are extreme outliers in the X-direction. In regression analysis, the detection of these leverage points becomes important due to their arbitrary large effects on the estimations as well as multicollinearity problems. Mahalanobis Distance (MD) has been used as a diagnostic tool for identification of outliers in multivariate analysis where it finds the distance between normal and abnormal groups of the data. Since the computation of MD relies on nonrobust classical estimations, the classical MD can hardly detect outliers accurately. As an alternative,Robust MD (RMD) methods such as Minimum Covariance Determinant (MCD) and Minimum Volume Ellipsoid (MVE) estimators had been used to identify the existence of high leverage points in the data set. However, these methods tended to swamp some low leverage points even though they can identify high leverage points correctly. Since, the detection of leverage points is one of the most important issues in regression analysis, it is imperative to introduce a novel detection method of high leverage points. Approach: In this study, we proposed a relatively new two-step method for detection of high leverage points by utilizing the RMD (MVE) and RMD (MCD) in the first step to identify the suspected outlier points. Then, in the second step the MD was used based on the mean and covariance of the clean data set. We called this method two-step Robust Diagnostic Mahalanobis Distance (RDMDTS) which could identify high leverage points correctly and also swamps less low leverage points. Results: The merit of the newly proposed method was investigated extensively by real data sets and Monte Carlo Simulations study. The results of this study indicated that, for small sample sizes, the best detection method is (RDMDTS) (MVE)-mad while there was not much difference between (RDMDTS) (MVE)-mad and (RDMDTS) (MCD)-mad for large sample sizes. Conclusion/Recommendations: In order to swamp less low leverage as high leverage point, the proposed robust diagnostic methods, (RDMDTS) (MVE)-mad and (RDMDTS) (MCD)-mad were recommended.
first_indexed 2025-11-15T08:12:11Z
format Article
id upm-17502
institution Universiti Putra Malaysia
institution_category Local University
language English
English
last_indexed 2025-11-15T08:12:11Z
publishDate 2009
publisher Science Publications
recordtype eprints
repository_type Digital Repository
spelling upm-175022015-09-03T04:09:30Z http://psasir.upm.edu.my/id/eprint/17502/ Two steps robust Mahanalobis method for identification of multiple high leverage points Bagheri, Arezoo Midi, Habshah Rahmatullah Imon, A.H.M Problem statement: High leverage points are extreme outliers in the X-direction. In regression analysis, the detection of these leverage points becomes important due to their arbitrary large effects on the estimations as well as multicollinearity problems. Mahalanobis Distance (MD) has been used as a diagnostic tool for identification of outliers in multivariate analysis where it finds the distance between normal and abnormal groups of the data. Since the computation of MD relies on nonrobust classical estimations, the classical MD can hardly detect outliers accurately. As an alternative,Robust MD (RMD) methods such as Minimum Covariance Determinant (MCD) and Minimum Volume Ellipsoid (MVE) estimators had been used to identify the existence of high leverage points in the data set. However, these methods tended to swamp some low leverage points even though they can identify high leverage points correctly. Since, the detection of leverage points is one of the most important issues in regression analysis, it is imperative to introduce a novel detection method of high leverage points. Approach: In this study, we proposed a relatively new two-step method for detection of high leverage points by utilizing the RMD (MVE) and RMD (MCD) in the first step to identify the suspected outlier points. Then, in the second step the MD was used based on the mean and covariance of the clean data set. We called this method two-step Robust Diagnostic Mahalanobis Distance (RDMDTS) which could identify high leverage points correctly and also swamps less low leverage points. Results: The merit of the newly proposed method was investigated extensively by real data sets and Monte Carlo Simulations study. The results of this study indicated that, for small sample sizes, the best detection method is (RDMDTS) (MVE)-mad while there was not much difference between (RDMDTS) (MVE)-mad and (RDMDTS) (MCD)-mad for large sample sizes. Conclusion/Recommendations: In order to swamp less low leverage as high leverage point, the proposed robust diagnostic methods, (RDMDTS) (MVE)-mad and (RDMDTS) (MCD)-mad were recommended. Science Publications 2009 Article PeerReviewed application/pdf en http://psasir.upm.edu.my/id/eprint/17502/1/Two%20steps%20robust%20Mahanalobis%20method%20for%20identification%20of%20multiple%20high%20leverage%20points.pdf Bagheri, Arezoo and Midi, Habshah and Rahmatullah Imon, A.H.M (2009) Two steps robust Mahanalobis method for identification of multiple high leverage points. Journal of Mathematics and Statistics, 5 (2). pp. 97-106. ISSN 1549-3644 Mathematical statistics. Multicollinearity. 10.3844/jmssp.2009.97.106 English
spellingShingle Mathematical statistics.
Multicollinearity.
Bagheri, Arezoo
Midi, Habshah
Rahmatullah Imon, A.H.M
Two steps robust Mahanalobis method for identification of multiple high leverage points
title Two steps robust Mahanalobis method for identification of multiple high leverage points
title_full Two steps robust Mahanalobis method for identification of multiple high leverage points
title_fullStr Two steps robust Mahanalobis method for identification of multiple high leverage points
title_full_unstemmed Two steps robust Mahanalobis method for identification of multiple high leverage points
title_short Two steps robust Mahanalobis method for identification of multiple high leverage points
title_sort two steps robust mahanalobis method for identification of multiple high leverage points
topic Mathematical statistics.
Multicollinearity.
url http://psasir.upm.edu.my/id/eprint/17502/
http://psasir.upm.edu.my/id/eprint/17502/
http://psasir.upm.edu.my/id/eprint/17502/1/Two%20steps%20robust%20Mahanalobis%20method%20for%20identification%20of%20multiple%20high%20leverage%20points.pdf