Outlier Detection in Logistic Regression: A Quest for Reliable Knowledge from Predictive Modeling and Classification

Logistic regression is well known to the data mining research community as a tool for modeling and classification. The presence of outliers is an unavoidable phenomenon in data analysis. Detection of outliers is important to increase the accuracy of the required estimates and for reliable knowledge...

Full description

Bibliographic Details
Main Authors:	Nurunnabi, Abdul, West, Geoff
Other Authors:	Jilles Vreeken
Format:	Conference Paper
Published:	Conference Publishing Services 2012
Subjects:	data mining influential observation pattern recognition knowledge discovery reliability regression high leverge point statistical computing outlier
Online Access:	http://hdl.handle.net/20.500.11937/13467

_version_	1848748355321069568
author	Nurunnabi, Abdul West, Geoff
author2	Jilles Vreeken
author_facet	Jilles Vreeken Nurunnabi, Abdul West, Geoff
author_sort	Nurunnabi, Abdul
building	Curtin Institutional Repository
collection	Online Access
description	Logistic regression is well known to the data mining research community as a tool for modeling and classification. The presence of outliers is an unavoidable phenomenon in data analysis. Detection of outliers is important to increase the accuracy of the required estimates and for reliable knowledge discovery from the underlying databases. Most of the existing outlier detection methods in regression analysis are based on the single case deletion approach that is inefficient in the presence of multiple outliers because of the well known masking and swamping effects. To avoid these effects the multiple case deletion approach has been introduced. We propose a group deletion approach based diagnostic measure for identifying multiple influential observations in logistic regression. At the same time we introduce a plotting technique that can classify data into outliers, high leverage points, as well as influential and regular observations. This paper has two objectives. First, it investigates the problems of outlier detection in logistic regression, proposes a new method that can find multiple influential observations, and classifies the types of outlier. Secondly, it shows the necessity for proper identification of outliers and influential observations as a prelude for reliable knowledge discovery from modeling and classification via logistic regression. We demonstrate the efficiency of our method, compare the performance with the existing popular diagnostic methods, and explore the necessity of outlier detection for reliability and robustness in modeling and classification by using real datasets.
first_indexed	2025-11-14T07:03:43Z
format	Conference Paper
id	curtin-20.500.11937-13467
institution	Curtin University Malaysia
institution_category	Local University
last_indexed	2025-11-14T07:03:43Z
publishDate	2012
publisher	Conference Publishing Services
recordtype	eprints
repository_type	Digital Repository
spelling	curtin-20.500.11937-134672023-02-02T07:57:38Z Outlier Detection in Logistic Regression: A Quest for Reliable Knowledge from Predictive Modeling and Classification Nurunnabi, Abdul West, Geoff Jilles Vreeken Charles Ling Mohammed J. Zaki Arno Siebes Jeffrey Xu Yu Bart Goethals Geoff Webb Xindong Wu data mining influential observation pattern recognition knowledge discovery reliability regression high leverge point statistical computing outlier Logistic regression is well known to the data mining research community as a tool for modeling and classification. The presence of outliers is an unavoidable phenomenon in data analysis. Detection of outliers is important to increase the accuracy of the required estimates and for reliable knowledge discovery from the underlying databases. Most of the existing outlier detection methods in regression analysis are based on the single case deletion approach that is inefficient in the presence of multiple outliers because of the well known masking and swamping effects. To avoid these effects the multiple case deletion approach has been introduced. We propose a group deletion approach based diagnostic measure for identifying multiple influential observations in logistic regression. At the same time we introduce a plotting technique that can classify data into outliers, high leverage points, as well as influential and regular observations. This paper has two objectives. First, it investigates the problems of outlier detection in logistic regression, proposes a new method that can find multiple influential observations, and classifies the types of outlier. Secondly, it shows the necessity for proper identification of outliers and influential observations as a prelude for reliable knowledge discovery from modeling and classification via logistic regression. We demonstrate the efficiency of our method, compare the performance with the existing popular diagnostic methods, and explore the necessity of outlier detection for reliability and robustness in modeling and classification by using real datasets. 2012 Conference Paper http://hdl.handle.net/20.500.11937/13467 10.1109/ICDMW.2012.107 Conference Publishing Services fulltext
spellingShingle	data mining influential observation pattern recognition knowledge discovery reliability regression high leverge point statistical computing outlier Nurunnabi, Abdul West, Geoff Outlier Detection in Logistic Regression: A Quest for Reliable Knowledge from Predictive Modeling and Classification
title	Outlier Detection in Logistic Regression: A Quest for Reliable Knowledge from Predictive Modeling and Classification
title_full	Outlier Detection in Logistic Regression: A Quest for Reliable Knowledge from Predictive Modeling and Classification
title_fullStr	Outlier Detection in Logistic Regression: A Quest for Reliable Knowledge from Predictive Modeling and Classification
title_full_unstemmed	Outlier Detection in Logistic Regression: A Quest for Reliable Knowledge from Predictive Modeling and Classification
title_short	Outlier Detection in Logistic Regression: A Quest for Reliable Knowledge from Predictive Modeling and Classification
title_sort	outlier detection in logistic regression: a quest for reliable knowledge from predictive modeling and classification
topic	data mining influential observation pattern recognition knowledge discovery reliability regression high leverge point statistical computing outlier
url	http://hdl.handle.net/20.500.11937/13467

Outlier Detection in Logistic Regression: A Quest for Reliable Knowledge from Predictive Modeling and Classification

Similar Items