THE PROCESS OF CREATING AN AUTHOR’S DATASET FOR RETRAINING MODELS OF NETWORK INTRUSION DETECTION SYSTEMS BASED ON DEEP LEARNING

Authors

DOI:

https://doi.org/10.35546/kntu2078-4481.2025.2.2.30

Keywords:

network security, NIDS, deep learning, dataset, network traffic, cyber threats, attack simulation

Abstract

The article describes the process of creating an author’s set of real network streaming data intended for training models of network intrusion detection systems (NIDS) based on deep learning methods. Given the limited availability of modern, complete, and diverse datasets, the authors have developed a comprehensive methodology for generating such a set in a controlled environment that simulates realistic corporate network conditions. To generate traffic, they used both legitimate actions of ordinary users (web browsing, file sharing, e-mail, etc.) and various cyber threats, including DDoS, DoS, reconnaissance, attacks on web applications, Brute-force, and others. Tools such as Metasploit, Hping3, Slowloris, Hydra, and others were used to simulate attacks, and legitimate traffic was created by automated activity from multiple nodes that mimicked the behavior of ordinary users. Particular attention is paid to adapting the data collection process to limited resources, without losing the quality or representativeness of the collected traffic. An approach to balancing the amount of legitimate and abnormal traffic is proposed, which allows the dataset to be effectively used for training, retraining, validation, and testing of NIDS models. In addition, emphasis is placed on the structuredness, labels, and meta- information in the collected data, which simplifies further processing and analysis. The developed dataset can become a valuable resource for the cybersecurity research community, especially in the context of developing adaptive and intelligent intrusion detection systems that can effectively recognize both known and new types of attacks. The obtained results confirm the reliability of the proposed methodology, the high quality of the collected traffic, and its relevance to modern network security challenges.

References

Molina-Coronado B., Rosero-Montalvo P. D., Calafate C. T. Survey of Network Intrusion Detection Methods From the Perspective of the Knowledge Discovery in Databases Process // IEEE Transactions on Network and Service Management. 2020. Vol. 17, no. 4. P. 2451–2479. DOI: https://doi.org/10.1109/tnsm.2020.3016246 (дата звернення: 08.06.2025).

Jo M., Jeong H., Song B., Jo H. Encrypted Traffic Decryption Tools: Comparative Performance Analysis and Improvement Guidelines // Electronics. 2024. Vol. 13, no. 14. P. 2876. URL: https://doi.org/10.3390/electronics13142876 (дата звернення: 08.06.2025).

CICFlowMeter (formerly ISCXFlowMeter) [Електронний ресурс]. URL: https://www.unb.ca/cic/research/ applications.html (дата звернення: 08.06.2025).

SQLMap [Електронний ресурс]. URL: https://sqlmap.org/ (дата звернення: 08.06.2025).

Lyon G. Nmap Security Scanner [Електронний ресурс]. URL: http://nmap.org/ (дата звернення: 08.06.2025).

Maciejak D. Hydra [Електронний ресурс]. URL: https://github.com/vanhauser-thc/thc-hydra (дата звернення: 08.06.2025).

Kali Tools. hping3 Package Description [Електронний ресурс]. URL: https://www.kali.org/tools/hping3 (дата звернення: 08.06.2025).

Kali Tools. Slowhttptest Usage Example [Електронний ресурс]. URL: https://www.kali.org/tools/slowhttptest/ (дата звернення: 08.06.2025).

Fping [Електронний ресурс]. URL: https://fping.org/ (дата звернення: 08.06.2025).

The Browser Exploitation Framework (BeEF) [Електронний ресурс]. URL: https://beefproject.com (дата звернення: 08.06.2025).

Rapid7. Introducing msfvenom [Електронний ресурс]. URL: https://www.rapid7.com/blog/post/2011/05/24/ introducing-msfvenom/ (дата звернення: 08.06.2025).

Velarde-Alvarado P., Coronado E. V., Ponce H. et al. A Novel Framework for Generating Personalized Network Datasets for NIDS Based on Traffic Aggregation // Sensors. 2022. Vol. 22, no. 5. P. 1847. DOI: https://doi.org/10.3390/s22051847 (дата звернення: 08.06.2025).

Kim A., Park M., Lee D. H. AI-IDS: Application of Deep Learning to Real-Time Web Intrusion Detection // IEEE Access. 2020. Vol. 8. P. 70245–70261. DOI: https://doi.org/10.1109/access.2020.2986882 (дата звернення: 08.06.2025).

Flood R., Shiaeles S., Kolokotronis N. et al. Bad Design Smells in Benchmark NIDS Datasets // 2024 IEEE 9th European Symposium on Security and Privacy (EuroS&P), Vienna, Austria, 8–12 July 2024. P. 658–675. DOI: https://doi.org/10.1109/eurosp60621.2024.00042 (дата звернення: 08.06.2025).

Neto E. C. P., Silva L. C., Rocha D. M. et al. CICIoT2023: A Real-Time Dataset and Benchmark for Large-Scale Attacks in IoT Environment // Sensors. 2023. Vol. 23, no. 13. P. 5941. DOI: https://doi.org/10.3390/s23135941 (дата звернення: 08.06.2025).

Guerra J. L., Catania C., Veas E. Datasets are not Enough: Challenges in Labeling Network Traffic // Computers & Security. 2022. P. 102810. DOI: https://doi.org/10.1016/j.cose.2022.102810 (дата звернення: 08.06.2025).

Ferriyan A., Hermawan D., Suhartono D. Generating Network Intrusion Detection Dataset Based on Real and Encrypted Synthetic Attack Traffic // Applied Sciences. 2021. Vol. 11, no. 17. P. 7868. DOI: https://doi.org/10.3390/app11177868 (дата звернення: 08.06.2025).

Damasevicius R., Krilavicius T., Maskeliunas R. et al. LITNET-2020: An Annotated Real-World Network Flow Dataset for Network Intrusion Detection // Electronics. 2020. Vol. 9, no. 5. P. 800. DOI: https://doi.org/10.3390/electronics9050800 (дата звернення: 08.06.2025).

Rahman S., Chowdhury M. J. M., Jahan I. et al. SYN-GAN: A robust intrusion detection system using GAN- based synthetic data for IoT security // Internet of Things. 2024. Vol. 26. P. 101212. DOI: https://doi.org/10.1016/j.iot.2024.101212 (дата звернення: 08.06.2025).

А. О. Нікітенко і Є. О. Башков. Оптимізація системи виявлення мережевих вторгнень на основі глибокого навчання із використанням методу зменшення розмірності та метаевристичних алгоритмів // Наукові праці Вінницького національного технічного університету. 2025. № 1. DOI: https://doi.org/10.31649/2307-5376-2025-1-86-98 (дата звернення: 08.06.2025).

Sanmorino A., Puspasari D. R., Wibowo A. Feature Extraction vs Fine-tuning for Cyber Intrusion Detection Model // Jurnal INFOTEL. 2024. Vol. 16, no. 2. P. 302–315. DOI: https://doi.org/10.20895/infotel.v16i2.996 (дата звернення: 08.06.2025).

Published

2025-06-05