Eksplorasi Teknik Pre-Processing Berbasis eXtreme Gradient Boosting (XGBoost) pada Serangan DDoS
DOI:
https://doi.org/10.30865/jurikom.v12i6.9380Keywords:
DDoS, CIC-IoT2023, XGBoost, Pre-processing, Seleksi Fitur, SMOTEAbstract
Distributed Denial of Service (DDoS) attacks represent a critical threat to modern network security, particularly within Internet of Things (IoT) environments characterized by large-scale and heterogeneous traffic patterns. The primary challenges in detecting such attacks involve class imbalance, irrelevant features, and noise within the data, all of which can degrade the performance of machine learning-based detection models. This study evaluates the impact of a pre-processing pipeline—comprising the Synthetic Minority Over-sampling Technique (SMOTE), correlation-based feature selection, and advanced feature selection methods—on the performance of the XGBoost algorithm in detecting DDoS attacks using the CIC-IoT2023 dataset. Experimental results indicate that the XGBoost model trained on RAW data achieves exceptionally high performance, with an accuracy of 0.999983, precision of 0.985531, recall of 0.961390, and an F1-score of 0.999983. However, after applying the pre-processing techniques, all metrics experienced a decline, with accuracy decreasing to 0.958899, precision to 0.865729, recall to 0.748332, and the F1-score to 0.959158. The reduction in recall suggests a higher number of undetected attacks, whereas the drop in precision indicates an increase in false alarms. Nevertheless, the F1-score remaining above 0.95 demonstrates that the model continues to perform effectively overall. These findings reveal that pre-processing does not always lead to performance improvements, especially when the raw dataset is already relatively clean and balanced. This study provides deeper insights into how SMOTE, feature selection, and noise injection influence the generalization of XGBoost on IoT traffic, and emphasizes that the effectiveness of pre-processing is highly dependent on dataset characteristics and the intended application context of intrusion detection systems.
References
[1] W. A. Prabowo, K. Fauziah, A. S. Nahrowi, M. N. Faiz, and A. W. Muhammad, “Strengthening Network Security: Evaluation of Intrusion Detection and Prevention Systems Tools in Networking Systems,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 9, pp. 1–10, 2023, doi: 10.14569/IJACSA.2023.0140934.
[2] Muhammad Nur Faiz, Oman Somantri, and Arif Wirawan Muhammad, “Machine Learning-Based Feature Engineering to Detect DDoS Attacks,” J. Nas. Tek. Elektro dan Teknol. Inf., vol. 11, no. 3, pp. 176–182, Aug. 2022, doi: 10.22146/jnteti.v11i3.3423.
[3] A. W. Muhammad, M. N. Faiz, and U. Athiyah, “Pengembangan Perangkat Lunak Untuk Deteksi DDoS Berbasis Neural Network,” Infotekmesin, vol. 13, no. 02, pp. 301–307, 2022, doi: 10.35970/infotekmesin.v13i2.1396.
[4] H. Lv, Y. Du, X. Zhou, W. Ni, and X. Ma, “A Data Enhancement Algorithm for DDoS Attacks Using IoT,” Sensors, vol. 23, no. 17, 2023, doi: 10.3390/s23177496.
[5] M. N. Faiz, O. Somantri, A. R. Supriyono, and A. W. Muhammad, “Impact of Feature Selection Methods on Machine Learning-based for Detecting DDoS Attacks : Literature Review,” J. Informatics Telecommun. Eng., vol. 5, no. 2, pp. 305–314, 2022, doi: 10.31289/jite.v5i2.6112.
[6] CISA, “DDoS Attack Trends Report 2024,” 2024.
[7] I. Ko, D. Chambers, and E. Barrett, “Adaptable feature-selecting and threshold-moving complete autoencoder for DDoS flood attack mitigation,” J. Inf. Secur. Appl., vol. 55, no. October, p. 102647, 2020, doi: 10.1016/j.jisa.2020.102647.
[8] L. Sari, M. N. Faiz, and A. W. Muhammad, “Perbandingan Pendekatan Machine Learning dalam Deteksi Serangan DDoS Jaringan Komputer,” Infotekmesin, vol. 16, no. 1, pp. 153–159, 2025, doi: 10.35970/infotekmesin.v16i1.2556.
[9] A. A. Bahashwan, M. Anbar, S. Manickam, T. A. Al-Amiedy, M. A. Aladaileh, and I. H. Hasbullah, “A Systematic Literature Review on Machine Learning and Deep Learning Approaches for Detecting DDoS Attacks in Software-Defined Networking,” Sensors, vol. 23, no. 9, 2023, doi: 10.3390/s23094441.
[10] M. Tayyab, B. Belaton, and M. Anbar, “ICMPV6-based DOS and DDoS attacks detection using machine learning techniques, open challenges, and blockchain applicability: A review,” IEEE Access, vol. 8, no. September, pp. 170529–170547, 2020, doi: 10.1109/ACCESS.2020.3022963.
[11] Z. T. Sworna, Z. Mousavi, and M. A. Babar, “NLP methods in host-based intrusion detection systems: A systematic review and future directions,” J. Netw. Comput. Appl., vol. 220, no. August, p. 103761, 2023, doi: 10.1016/j.jnca.2023.103761.
[12] Z. Liu, Y. Wang, F. Feng, Y. Liu, Z. Li, and Y. Shan, “A DDoS Detection Method Based on Feature Engineering and Machine Learning in Software-Defined Networks,” Sensors (Basel)., vol. 23, no. 13, 2023, doi: 10.3390/s23136176.
[13] A. Pawar and N. Tiwari, “A Novel Approach of DDOS Attack Classification with Optimizing the Ensemble Classifier Using A Hybrid Firefly and Particle Swarm Optimization (HFPSO),” Int. J. Intell. Eng. Syst., vol. 16, no. 4, pp. 201–214, 2023, doi: 10.22266/ijies2023.0831.17.
[14] M. Alduailij, Q. W. Khan, M. Tahir, M. Sardaraz, M. Alduailij, and F. Malik, “Machine-Learning-Based DDoS Attack Detection Using Mutual Information and Random Forest Feature Importance Method,” Symmetry (Basel)., vol. 14, no. 6, pp. 1–15, 2022, doi: 10.3390/sym14061095.
[15] M. A. Faizin, D. T. Kurniasari, N. Elqolby, M. A. R. Putra, and T. Ahmad, “Optimizing Feature Selection Method in Intrusion Detection System Using Thresholding,” Int. J. Intell. Eng. Syst., vol. 17, no. 3, pp. 214–226, 2024, doi: 10.22266/ijies2024.0630.18.
[16] M. A. Talukder et al., “Machine learning-based network intrusion detection for big and imbalanced data using oversampling, stacking feature embedding and feature extraction,” J. Big Data, vol. 11, no. 1, 2024, doi: 10.1186/s40537-024-00886-w.
[17] T. Wu, H. Fan, H. Zhu, C. You, H. Zhou, and X. Huang, “Intrusion detection system combined enhanced random forest with SMOTE algorithm,” EURASIP J. Adv. Signal Process., vol. 2022, no. 1, 2022, doi: 10.1186/s13634-022-00871-6.
[18] A. Chiriac et al., “Beyond Atrial Fibrillation : Machine Learning,” MAYO Clin. Proc. Digit. Heal., vol. 2, no. 1, pp. 92–103, 2024, doi: 10.1016/j.mcpdig.2023.12.002.
[19] J. Yan, H. Zhou, and W. Wang, “Intelligent Network Element : A Programmable Switch Based on Machine Learning to Defend Against DDoS Attacks,” Inf. Syst. Front., 2025, doi: 10.1007/s10796-024-10577-9.
[20] D. Panda, N. Pandhy, and K. Sharma, “DDOS Attack Detection and Performance Analysis in IOT Network using Machine Learning Approaches USING MACHINE LEARNING APPROACHES,” Scalable Comput. Pract. Exp., vol. 26, no. 2, pp. 950–963, 2025, doi: 10.12694/scpe.v26i2.4059.
[21] S. Lee, D. Roh, J. Yu, D. Moon, J. Lee, and J. H. Bae, “Deep Feature Fusion via Transfer Learning for Multi-Class Network Intrusion Detection,” Appl. Sci., vol. 15, no. 9, pp. 1–21, 2025, doi: 10.3390/app15094851.
[22] K. Mehmood et al., “Machine Learning and Spatio Temporal Analysis for Assessing Ecological Impacts of the Billion Tree Afforestation Project,” Ecol. Evol., vol. 15, no. 2, pp. 1–29, 2025, doi: 10.1002/ece3.70736.
[23] R. Mohammad, F. Saeed, A. A. Almazroi, F. S. Alsubaei, and A. A. Almazroi, “Enhancing Intrusion Detection Systems Using a Deep Learning and Data Augmentation Approach,” Systems, vol. 12, no. 3, pp. 1–18, 2024, doi: 10.3390/systems12030079.
[24] M. Rajkumar, J. Karthika, and S. S. Abinayaa, “Multi-view consistent generative adversarial network for enhancing intrusion detection with prevention systems in mobile ad hoc networks against security attacks,” Comput. Secur., vol. 150, no. 5, p. 104242, Mar. 2025, doi: 10.1016/j.cose.2024.104242.
[25] Z. Xu, “Deep Learning Based DDoS Attack Detection,” in ITM Web of Conferences, 2025, p. 03005. doi: 10.1051/itmconf/20257003005.
[26] A. M. Alsaffar, M. Nouri-Baygi, and H. Zolbanin, “Enhancing Intrusion Detection Systems with Dimensionality Reduction and Multi-Stacking Ensemble Techniques,” Algorithms, vol. 17, no. 12, 2024, doi: 10.3390/a17120550.
[27] D. Gonzalez-Cuautle et al., “Synthetic minority oversampling technique for optimizing classification tasks in botnet and intrusion-detection-system datasets,” Appl. Sci., vol. 10, no. 3, 2020, doi: 10.3390/app10030794.
[28] D. Mualfah, W. Fadila, and R. Firdaus, “Teknik SMOTE untuk Mengatasi Imbalance Data pada Deteksi Penyakit Stroke Menggunakan Algoritma Random Forest,” J. CoSciTech (Computer Sci. Inf. Technol., vol. 3, no. 2, pp. 107–113, 2022, doi: 10.37859/coscitech.v3i2.3912.
[29] M. Riyadh, B. J. Ali, and D. R. Alshibani, “IDS-MIU: an Intrusion Detection System Based on Machine Learning Techniques for Mixed Type, Incomplete, and Uncertain Data Set,” Int. J. Intell. Eng. Syst., vol. 14, no. 3, pp. 493–502, 2021, doi: 10.22266/ijies2021.0630.41.
[30] A. R. Salehi and M. Khedmati, “A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data,” Sci. Rep., vol. 14, no. 1, pp. 1–18, 2024, doi: 10.1038/s41598-024-55598-1.
[31] M. Nassef, “Boosting Intrusion Detection Against DDoS Attacks Using a Feature Engineering-Based Fine-Tuned XGBoost Model,” Int. J. Semant. Web Inf. Syst., vol. 21, no. 1, pp. 1–39, 2025, doi: 10.4018/IJSWIS.383062.
[32] K. Kurniabudi, A. Harris, V. Veronica, and E. Yanti, “Optimizing Attack Detection for High Dimensionality and Imbalanced Data with SMOTE, Chi-Square and Random Forest Classifier,” IJICS (International J. Informatics Comput. Sci., vol. 6, no. 1, p. 1, 2022, doi: 10.30865/ijics.v6i1.3890.



