A novel approach for handling missing data to enhance network intrusion detection system
Managing missing data is a critical challenge in Intrusion Detection System (IDS) datasets, significantly affecting the performance of deep learning models. To address this issue, we introduce DeepLearning_Based_MissingData_Imputation (DMDI), a novel method designed to enhance the quality of input d...
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
KeAi Communications
2025
|
| Online Access: | http://psasir.upm.edu.my/id/eprint/120305/ http://psasir.upm.edu.my/id/eprint/120305/1/120305.pdf |
| _version_ | 1848868158777065472 |
|---|---|
| author | Tahir, Mahjabeen Abdullah, Azizol Udzir, Nur Izura Kasmiran, Khairul Azhar |
| author_facet | Tahir, Mahjabeen Abdullah, Azizol Udzir, Nur Izura Kasmiran, Khairul Azhar |
| author_sort | Tahir, Mahjabeen |
| building | UPM Institutional Repository |
| collection | Online Access |
| description | Managing missing data is a critical challenge in Intrusion Detection System (IDS) datasets, significantly affecting the performance of deep learning models. To address this issue, we introduce DeepLearning_Based_MissingData_Imputation (DMDI), a novel method designed to enhance the quality of input data by efficiently handling missing values. Our approach employs the Random Missing Value (RMV) algorithm to simulate missing data, enabling thorough testing and comparison of various imputation techniques. The DMDI method integrates a stacked denoising autoencoder with Gradient Boosting to improve imputation accuracy. We evaluated the effectiveness of our approach through three experimental phases: generating missing data, imputing missing values, and assessing imputation models. Using the NSL-KDD and UNSW-NB15 datasets, our results demonstrate significant improvements in the performance of five different classifiers (SVM, KNN, Logistic Regression, Decision Tree, and Random Forest) after imputation. On average, our method achieved accuracy improvements ranging from 0.95 to 0.97 across these classifiers compared to baseline imputation methods. Detailed analysis using Python 3 validates our findings, demonstrating enhanced model performance and robustness. This study underscores the necessity of precise missing data imputation for enhancing deep learning tasks, particularly in anomaly detection systems. It provides a reliable solution for managing missing data in IDS datasets. |
| first_indexed | 2025-11-15T14:47:57Z |
| format | Article |
| id | upm-120305 |
| institution | Universiti Putra Malaysia |
| institution_category | Local University |
| language | English |
| last_indexed | 2025-11-15T14:47:57Z |
| publishDate | 2025 |
| publisher | KeAi Communications |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | upm-1203052025-09-30T02:58:38Z http://psasir.upm.edu.my/id/eprint/120305/ A novel approach for handling missing data to enhance network intrusion detection system Tahir, Mahjabeen Abdullah, Azizol Udzir, Nur Izura Kasmiran, Khairul Azhar Managing missing data is a critical challenge in Intrusion Detection System (IDS) datasets, significantly affecting the performance of deep learning models. To address this issue, we introduce DeepLearning_Based_MissingData_Imputation (DMDI), a novel method designed to enhance the quality of input data by efficiently handling missing values. Our approach employs the Random Missing Value (RMV) algorithm to simulate missing data, enabling thorough testing and comparison of various imputation techniques. The DMDI method integrates a stacked denoising autoencoder with Gradient Boosting to improve imputation accuracy. We evaluated the effectiveness of our approach through three experimental phases: generating missing data, imputing missing values, and assessing imputation models. Using the NSL-KDD and UNSW-NB15 datasets, our results demonstrate significant improvements in the performance of five different classifiers (SVM, KNN, Logistic Regression, Decision Tree, and Random Forest) after imputation. On average, our method achieved accuracy improvements ranging from 0.95 to 0.97 across these classifiers compared to baseline imputation methods. Detailed analysis using Python 3 validates our findings, demonstrating enhanced model performance and robustness. This study underscores the necessity of precise missing data imputation for enhancing deep learning tasks, particularly in anomaly detection systems. It provides a reliable solution for managing missing data in IDS datasets. KeAi Communications 2025 Article PeerReviewed text en cc_by_nc_nd_4 http://psasir.upm.edu.my/id/eprint/120305/1/120305.pdf Tahir, Mahjabeen and Abdullah, Azizol and Udzir, Nur Izura and Kasmiran, Khairul Azhar (2025) A novel approach for handling missing data to enhance network intrusion detection system. Cyber Security and Applications, 3. art. no. 100063. pp. 1-11. ISSN 2772-9184 https://linkinghub.elsevier.com/retrieve/pii/S2772918424000298 10.1016/j.csa.2024.100063 |
| spellingShingle | Tahir, Mahjabeen Abdullah, Azizol Udzir, Nur Izura Kasmiran, Khairul Azhar A novel approach for handling missing data to enhance network intrusion detection system |
| title | A novel approach for handling missing data to enhance network intrusion detection system |
| title_full | A novel approach for handling missing data to enhance network intrusion detection system |
| title_fullStr | A novel approach for handling missing data to enhance network intrusion detection system |
| title_full_unstemmed | A novel approach for handling missing data to enhance network intrusion detection system |
| title_short | A novel approach for handling missing data to enhance network intrusion detection system |
| title_sort | novel approach for handling missing data to enhance network intrusion detection system |
| url | http://psasir.upm.edu.my/id/eprint/120305/ http://psasir.upm.edu.my/id/eprint/120305/ http://psasir.upm.edu.my/id/eprint/120305/ http://psasir.upm.edu.my/id/eprint/120305/1/120305.pdf |