Integrating LightGBM and XGBoost for Software Defect Classification Problem

 (*)Gregorius Airlangga Mail (Atma Jaya Catholic University of Indonesia, Jakarta, Indonesia)

(*) Corresponding Author

Submitted: January 3, 2024; Published: January 24, 2024


Software defect classification is a crucial process in quality assurance, pivotal for the development of reliable software systems. This paper presents an innovative approach that synergizes traditional software complexity metrics with advanced machine learning algorithms, namely Light Gradient Boosting Machine (LightGBM) and Extreme Gradient Boosting (XGBoost), to enhance the accuracy and efficiency of software defect classification. Leveraging a dataset characterized by McCabe's and Halstead's metrics, this study embarks on meticulous data preprocessing, feature engineering, and hyperparameter optimization to train and evaluate the proposed models. The LightGBM and XGBoost models are fine-tuned through the Optuna framework, aiming to maximize the ROC-AUC score as a measure of classification performance. The results indicate that both models perform robustly, with XGBoost demonstrating a slight superiority in predictive capability. The integration of machine learning with traditional complexity metrics not only enhances the defect classification process but also provides deeper insights into the factors influencing software quality. The findings suggest that such hybrid approaches can significantly contribute to the predictive analytics tools available to software engineers and quality assurance professionals. This research contributes to the field by offering a comprehensive methodological framework and empirical evidence for the effectiveness of combining machine learning algorithms with traditional software complexity metrics in software defect classification.


Software Defect; Classification; LightGBM; XGBoost; Machine Learning

Full Text:


Article Metrics

Abstract view : 107 times
PDF - 31 times


H. Tong, B. Liu, and S. Wang, “Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning,” Inf. Softw. Technol., vol. 96, pp. 94–111, 2018.

F. Matloob et al., “Software defect prediction using ensemble learning: A systematic literature review,” IEEE Access, vol. 9, pp. 98754–98771, 2021.

W. Zheng, T. Shen, X. Chen, and P. Deng, “Interpretability application of the Just-in-Time software defect prediction model,” J. Syst. Softw., vol. 188, p. 111245, 2022.

L. Qiao, X. Li, Q. Umer, and P. Guo, “Deep learning based software defect prediction,” Neurocomputing, vol. 385, pp. 100–110, 2020.

S. Nieminen and others, “Task Complexity Analysis: A Mobile Application Case Study,” 2022.

S. Mukherjee, A. Hennig, T. G. Topcu, and Z. Szajnfarber, “When Decomposition Increases Complexity: How Decomposing Introduces New Information Into the Problem Space,” in International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, 2021, vol. 85420, p. V006T06A050.

S. M. Fakhoury, Models, Metrics, and Minds: Empirical Perspectives on Developer Productivity. Washington State University, 2022.

S. Datta, A. Baul, G. C. Sarker, P. K. Sadhu, and D. R. Hodges, “A comprehensive review of the application of machine learning in fabrication and implementation of photovoltaic systems,” IEEE Access, 2023.

A. Yaqoob, R. Musheer Aziz, and N. K. verma, “Applications and techniques of machine learning in cancer classification: a systematic review,” Human-Centric Intell. Syst., pp. 1–28, 2023.

M. Drogkoula, K. Kokkinos, and N. Samaras, “A Comprehensive Survey of Machine Learning Methodologies with Emphasis in Water Resources Management,” Appl. Sci., vol. 13, no. 22, p. 12147, 2023.

Q. Miao, W. Zheng, Y. Lv, M. Huang, W. Ding, and F.-Y. Wang, “DAO to HANOI via DeSci: AI paradigm shifts from AlphaGo to ChatGPT,” IEEE/CAA J. Autom. Sin., vol. 10, no. 4, pp. 877–897, 2023.

C. van Dun, “Data-Driven Business Process Management: Advancing Process Data Quality and Process Improvement,” 2022.

B. Setz, “An Internet of Things and data-driven approach to data centers,” 2022.

M. Dorin, “The Relationship Between Software Complicacy and Software Reliability,” Universität Würzburg, 2022.

K. S. M. Kumar, “Software Defect Prediction with Fuzzy Logic,” Auburn University, 2020.

Y. Shao, J. Zhao, X. Wang, W. Wu, and J. Fang, “Research on Cross-Company Defect Prediction Method to Improve Software Security,” Secur. Commun. Networks, vol. 2021, pp. 1–19, 2021.

S. H. Bhakta, “Cultural Exploration of the National Institutes of Health Neuropsychological Cognition Battery Assessments in India,” The Chicago School of Professional Psychology, 2020.

G. Grano, “The multiple facets of test case quality: analyzing effectiveness and going beyond,” University of Zurich, 2021.

A. H. Netzley, “Assessment of Behavioral and fMRI Differences in a Minipig Model of Pediatric Concussion,” Michigan State University, 2023.

D. Breskuvien.e, “Forbearance prediction using XGBoost and LightGBM models,” in DAMSS 2021: 12th conference on data analysis methods for software systems, Druskininkai, Lithuania, December 2--4, 2021, 2021.

S. Demir and E. K. Sahin, “Predicting occurrence of liquefaction-induced lateral spreading using gradient boosting algorithms integrated with particle swarm optimization: PSO-XGBoost, PSO-LightGBM, and PSO-CatBoost,” Acta Geotech., vol. 18, no. 6, pp. 3403–3419, 2023.

S. González, S. Garc’ia, J. Del Ser, L. Rokach, and F. Herrera, “A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities,” Inf. Fusion, vol. 64, pp. 205–237, 2020.

B. Quinto, Next-generation machine learning with spark: Covers XGBoost, LightGBM, Spark NLP, distributed deep learning with keras, and more. Apress, 2020.

R. Saborido, J. Ferrer, F. Chicano, and E. Alba, “Automatizing Software Cognitive Complexity Reduction,” IEEE Access, vol. 10, pp. 11642–11656, 2022.

F. Chinesta and E. Cueto, “Empowering engineering with data, machine learning and artificial intelligence: a short introductive review,” Adv. Model. Simul. Eng. Sci., vol. 9, no. 1, p. 21, 2022.

D. A. Otchere, “Fundamental error in tree-based machine learning model selection for reservoir characterisation,” Energy Geosci., p. 100229, 2023.

A. Callens, D. Morichon, S. Abadie, M. Delpey, and B. Liquet, “Using Random forest and Gradient boosting trees to improve wave forecast at a specific location,” Appl. Ocean Res., vol. 104, p. 102339, 2020.

M. J. Hernández-Molinos, A. J. Sánchez-Garc’ia, R. E. Barrientos-Mart’inez, J. C. Pérez-Arriaga, and J. O. Ocharán-Hernández, “Software Defect Prediction with Bayesian Approaches,” Mathematics, vol. 11, no. 11, p. 2524, 2023.

H. Alsolai and M. Roper, “A systematic literature review of machine learning techniques for software maintainability prediction,” Inf. Softw. Technol., vol. 119, p. 106214, 2020.

S. K. Pandey, R. B. Mishra, and A. K. Tripathi, “Machine learning based methods for software fault prediction: A survey,” Expert Syst. Appl., vol. 172, p. 114595, 2021.

G. Ke et al., “Lightgbm: A highly efficient gradient boosting decision tree,” Adv. Neural Inf. Process. Syst., vol. 30, 2017.

T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794.

L. Yang, J. Wu, X. Niu, and L. Shi, “Towards purchase prediction: a voting-based method leveraging transactional information,” in 2022 5th International Conference on Data Science and Information Technology (DSIT), 2022, pp. 1–5.

E. S. Solano and C. M. Affonso, “Solar Irradiation Forecasting Using Ensemble Voting Based on Machine Learning Algorithms,” Sustainability, vol. 15, no. 10, p. 7943, 2023.

J. Kim, “Ensemble Approach for Predicting the Diagnosis of Osteoarthritis Using Soft Voting Classifier,” medRxiv, pp. 2001–2023, 2023.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Integrating LightGBM and XGBoost for Software Defect Classification Problem


  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

STMIK Budi Darma
Secretariat: Sisingamangaraja No. 338 Telp 061-7875998

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.