Sensitivity of missing values in classification tree for large sample

Missing values either in predictor or in response variables are a very common problem in statistics and data mining. Cases with missing values are often ignored which results in loss of information and possible bias. The objectives of our research were to investigate the sensitivity of missing data...

Full description

Bibliographic Details
Main Authors: Hasan, Norsida, Adam, Mohd Bakri, Mustapha, Norwati, Abu Bakar, Mohd Rizam
Format: Conference or Workshop Item
Language:English
Published: American Institute of Physics 2011
Online Access:http://psasir.upm.edu.my/id/eprint/57334/
http://psasir.upm.edu.my/id/eprint/57334/1/Sensitivity%20of%20missing%20values%20in%20classification%20tree%20for%20large%20sample.pdf
Description
Summary:Missing values either in predictor or in response variables are a very common problem in statistics and data mining. Cases with missing values are often ignored which results in loss of information and possible bias. The objectives of our research were to investigate the sensitivity of missing data in classification tree model for large sample. Data were obtained from one of the high level educational institutions in Malaysia. Students' background data were randomly eliminated and classification tree was used to predict students degree classification. The results showed that for large sample, the structure of the classification tree was sensitive to missing values especially for sample contains more than ten percent missing values.