A cascaded classifier approach for improving detection rates on rare attack categories in network intrusion detection

Network intrusion detection research work that employed KDDCup 99 dataset often encounter challenges in creating classifiers that could handle unequal distributed attack categories. The accuracy of a classification model could be jeopardized if the distribution of attack categories in a training dat...

Full description

Bibliographic Details
Main Authors: Khor, Kok Chin, Ting, Choo Yee, Somnuk, Phon Amnuaisuk
Format: Article
Language:English
Published: Springer US 2012
Subjects:
Online Access:http://shdl.mmu.edu.my/3463/
http://shdl.mmu.edu.my/3463/1/A%20cascaded%20classifier%20approach%20for%20improving%20detection%20rates%20on%C2%A0rare%20attack%20categories%20in%20network%20intrusion%20detection.pdf
Description
Summary:Network intrusion detection research work that employed KDDCup 99 dataset often encounter challenges in creating classifiers that could handle unequal distributed attack categories. The accuracy of a classification model could be jeopardized if the distribution of attack categories in a training dataset is heavily imbalanced where the rare categories are less than 2% of the total population. In such cases, the model could not efficiently learn the characteristics of rare categories and this will result in poor detection rates. In this research, we introduce an efficient and effective approach in dealing with the unequal distribution of attack categories. Our approach relies on the training of cascaded classifiers using a dichotomized training dataset in each cascading stage. The training dataset is dichotomized based on the rare and non-rare attack categories. The empirical findings support our arguments that training cascaded classifiers using the dichotomized dataset provides higher detection rates on the rare categories as well as comparably higher detection rates for the non-rare attack categories as compared to the findings reported in other research works. The higher detection rates are due to the mitigation of the influence from the dominant categories if the rare attack categories are separated from the dataset.