Evaluating oversampling techniques for network intrusion detection data

In this digital era, the amount of information being exchanged over the networks has increased exponentially due to technological advancement. Thus, cyberattacks have in creased in tandem with the exponential expansion of digitalisation worldwide. As a result, implementing an IDS is one of the appro...

Full description

Bibliographic Details
Main Author: Chan, Jia Lin
Format: Final Year Project / Dissertation / Thesis
Published: 2022
Subjects:
Online Access:http://eprints.utar.edu.my/5009/
http://eprints.utar.edu.my/5009/1/1902879_CHAN_JIA_LIN.pdf
_version_ 1848886302157570048
author Chan, Jia Lin
author_facet Chan, Jia Lin
author_sort Chan, Jia Lin
building UTAR Institutional Repository
collection Online Access
description In this digital era, the amount of information being exchanged over the networks has increased exponentially due to technological advancement. Thus, cyberattacks have in creased in tandem with the exponential expansion of digitalisation worldwide. As a result, implementing an IDS is one of the approaches to overcome the security problem in the network. Many network intrusion data sets are introduced and used as a benchmark to train predictive models and evaluate the IDS. However, the unbalanced class distribution in network intrusion data sets has become a significant challenge in building classification models, leading to low intrusion detection rates (DR). This research i dentified four unbalanced network intrusion detection data sets: UNSWNB15, NSL KDD, CICIDS2017, and CICDDOS 2019, with low detection rates in minority attack classes. Five oversampling techniques: ROS, SMOTE, Borderline SMOTE (BSMOTE), ADASYN and KMean S MOTE (KMSMOTE), were then applied to the minority attack classes in the datasets. Eventually, models, i.e., Gaussian Bayes, Logistic Regression and Decision Tree, were built using the data sets, and the model performance was compared. According to the analysis, each data set has a different oversampling method that outperforms. KMSMOTE outperforms in UNSW NB15, ROS excels in NSL KDD, and SMOTE outperforms in CICIDS 2017 and CICDDOS 2019, while SMOTE has the highest number of topperforming occurrences among all data sets. In general, oversampling can increase the detection rate (DR) for the minority attack classes, the DR increment ranging from 11.93 % in CICDDOS 2019 to a maximum of 20.02 % in NSL KDD.
first_indexed 2025-11-15T19:36:20Z
format Final Year Project / Dissertation / Thesis
id utar-5009
institution Universiti Tunku Abdul Rahman
institution_category Local University
last_indexed 2025-11-15T19:36:20Z
publishDate 2022
recordtype eprints
repository_type Digital Repository
spelling utar-50092022-12-26T14:21:45Z Evaluating oversampling techniques for network intrusion detection data Chan, Jia Lin QA76 Computer software In this digital era, the amount of information being exchanged over the networks has increased exponentially due to technological advancement. Thus, cyberattacks have in creased in tandem with the exponential expansion of digitalisation worldwide. As a result, implementing an IDS is one of the approaches to overcome the security problem in the network. Many network intrusion data sets are introduced and used as a benchmark to train predictive models and evaluate the IDS. However, the unbalanced class distribution in network intrusion data sets has become a significant challenge in building classification models, leading to low intrusion detection rates (DR). This research i dentified four unbalanced network intrusion detection data sets: UNSWNB15, NSL KDD, CICIDS2017, and CICDDOS 2019, with low detection rates in minority attack classes. Five oversampling techniques: ROS, SMOTE, Borderline SMOTE (BSMOTE), ADASYN and KMean S MOTE (KMSMOTE), were then applied to the minority attack classes in the datasets. Eventually, models, i.e., Gaussian Bayes, Logistic Regression and Decision Tree, were built using the data sets, and the model performance was compared. According to the analysis, each data set has a different oversampling method that outperforms. KMSMOTE outperforms in UNSW NB15, ROS excels in NSL KDD, and SMOTE outperforms in CICIDS 2017 and CICDDOS 2019, while SMOTE has the highest number of topperforming occurrences among all data sets. In general, oversampling can increase the detection rate (DR) for the minority attack classes, the DR increment ranging from 11.93 % in CICDDOS 2019 to a maximum of 20.02 % in NSL KDD. 2022 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/5009/1/1902879_CHAN_JIA_LIN.pdf Chan, Jia Lin (2022) Evaluating oversampling techniques for network intrusion detection data. Final Year Project, UTAR. http://eprints.utar.edu.my/5009/
spellingShingle QA76 Computer software
Chan, Jia Lin
Evaluating oversampling techniques for network intrusion detection data
title Evaluating oversampling techniques for network intrusion detection data
title_full Evaluating oversampling techniques for network intrusion detection data
title_fullStr Evaluating oversampling techniques for network intrusion detection data
title_full_unstemmed Evaluating oversampling techniques for network intrusion detection data
title_short Evaluating oversampling techniques for network intrusion detection data
title_sort evaluating oversampling techniques for network intrusion detection data
topic QA76 Computer software
url http://eprints.utar.edu.my/5009/
http://eprints.utar.edu.my/5009/1/1902879_CHAN_JIA_LIN.pdf