Integrasi SMOTE pada Naive Bayes dan Logistic Regression Berbasis Particle Swarm Optimization untuk Prediksi Cacat Perangkat Lunak

 (*)Andre Hardoni Mail (Universitas Sriwijaya, Palembang, Indonesia)
 Dian Palupi Rini (Universitas Sriwijaya, Palembang, Indonesia)
 Sukemi Sukemi (Universitas Sriwijaya, Palembang, Indonesia)

(*) Corresponding Author

Submitted: November 28, 2020; Published: January 22, 2021



Software defects are one of the main contributors to information technology waste and lead to rework, thus consuming a lot of time and money. Software defect prediction has the objective of defect prevention by classifying certain modules as defective or not defective. Many researchers have conducted research in the field of software defect prediction using NASA MDP public datasets, but these datasets still have shortcomings such as class imbalance and noise attribute. The class imbalance problem can be overcome by utilizing SMOTE (Synthetic Minority Over-sampling Technique) and the noise attribute problem can be solved by selecting features using Particle Swarm Optimization (PSO), So in this research, the integration between SMOTE and PSO is applied to the classification technique machine learning naïve Bayes and logistic regression. From the results of experiments that have been carried out on 8 NASA MDP datasets by dividing the dataset into training and testing data, it is found that the SMOTE + PSO integration in each classification technique can improve classification performance with the highest AUC (Area Under Curve) value on average 0,89 on logistic regression and 0,86 in naïve Bayes in the training and at the same time better than without combining the two.


Software Defect Prediction; Naïve Bayes; Logistic Regression; SMOTE; PSO

Full Text:


Article Metrics

Abstract View: 69 times | PDF View: 13 times


L. Bergmane, J. Grabis, dan E. Žeiris, “A Case Study: Software Defect Root Causes,” Inf. Technol. Manag. Sci., 2018, doi: 10.1515/itms-2017-0009.

B. Turhan dan A. Bener, “Software defect prediction: Heuristics for weighted naïve bayes,” in ICSOFT 2007 - 2nd International Conference on Software and Data Technologies, Proceedings, 2007.

T. Sedano, P. Ralph, dan C. Peraire, “Software Development Waste,” in Proceedings - 2017 IEEE/ACM 39th International Conference on Software Engineering, ICSE 2017, 2017, doi: 10.1109/ICSE.2017.20.

A. A. Shenvi, “Defect prevention with orthogonal defect classification,” in Proceedings of the 2nd India Software Engineering Conference, ISEC 2009, 2009, doi: 10.1145/1506216.1506232.

T. O. A. Lehtinen, M. V. Mäntylä, J. Vanhanen, J. Itkonen, dan C. Lassenius, “Perceived causes of software project failures - An analysis of their relationships,” Inf. Softw. Technol., 2014, doi: 10.1016/j.infsof.2014.01.015.

M. McDonald, R. Musson, dan R. Smith, The Practical Guide to Defect Prevention. 2008.

A. Iqbal dkk., “Performance analysis of machine learning techniques on software defect prediction using NASA datasets,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 5, hal. 300–308, 2019, doi: 10.14569/ijacsa.2019.0100538.

T. Hall, S. Beecham, D. Bowes, D. Gray, dan S. Counsell, “A systematic literature review on fault prediction performance in software engineering,” IEEE Transactions on Software Engineering. 2012, doi: 10.1109/TSE.2011.103.

S. Canu dan A. Smola, “Kernel methods and the exponential family,” Neurocomputing, 2006, doi: 10.1016/j.neucom.2005.12.009.

T. Wang dan W. H. Li, “Naïve bayes software defect prediction model,” in 2010 International Conference on Computational Intelligence and Software Engineering, CiSE 2010, 2010, doi: 10.1109/CISE.2010.5677057.

R. S. Wahono dan N. Suryana, “Combining particle swarm optimization based feature selection and bagging technique for software defect prediction,” Int. J. Softw. Eng. its Appl., vol. 7, no. 5, hal. 153–166, 2013, doi: 10.14257/ijseia.2013.7.5.16.

T. M. Khoshgoftaar, K. Gao, A. Napolitano, dan R. Wald, “A comparative study of iterative and non-iterative feature selection techniques for software defect prediction,” Inf. Syst. Front., vol. 16, no. 5, hal. 801–822, 2014, doi: 10.1007/s10796-013-9430-0.

J. Han, M. Kamber, dan J. Pei, Data Mining: Concepts and TechniquesHan, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques. San Francisco, CA, itd: Morgan Kaufmann. 2012.

N. V. Chawla, K. W. Bowyer, L. O. Hall, dan W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell. Res., 2002, doi: 10.1613/jair.953.

J. Li, S. Fong, dan Y. Zhuang, “Optimizing SMOTE by Metaheuristics with Neural Network and Decision Tree,” in Proceedings - 2015 3rd International Symposium on Computational and Business Intelligence, ISCBI 2015, 2016, doi: 10.1109/ISCBI.2015.12.

C. W. Dawson, Projects in Computing and Information Systems. 2009.

I. Arora, V. Tetarwal, dan A. Saha, “Open issues in software defect prediction,” in Procedia Computer Science, 2015, doi: 10.1016/j.procs.2015.02.161.

T. R. Patil, “Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification,” Int. J. Comput. Sci. Appl. ISSN 0974-1011, 2013.

A. Saleh, “Klasifikasi Metode Naive Bayes Dalam Data Mining Untuk Menentukan Konsentrasi Siswa,” KeTIK, hal. 200–208, 2015.

F. Minabari, J. Titaley, dan N. Nainggolan, “Pengaruh Pelayanan Di Fakultas Matematika dan Ilmu Pengetahuan Alam Terhadap Kepuasan Mahasiswa Fmipa Unsrat Menggunakan Logistik Ordinal,” J. Mat. Dan Apl., vol. 8, no. 2, hal. 153–160, 2019.

A. Salim, “Optimalisasi Regresi Logistik Pada Proses Klasifikasi Menggunakan Algoritma Genetika,” vol. 6, no. 2, hal. 50–55, 2019, doi: 10.25047/jtit.v6i2.109.

Alberto Fernandez, Salvador Garcia, Francisco Herrera, dan Nitesh V. Chawla, “SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary,” J. Artif. Intell. Res., 2018.

Aries Saifudin, “Pendekatan Level Data dan Algoritma untuk Penanganan Ketidakseimbangan Kelas pada Prediks Cacat Software Berbasis Naϊve Bayes,” 2014.

R. M. Chen dan H. F. Shih, “Solving university course timetabling problems using constriction particle swarm optimization with local search,” Algorithms, vol. 6, no. 2, hal. 227–244, 2013, doi: 10.3390/a6020227.

A. Luque, A. Carrasco, A. Martín, dan A. de las Heras, “The impact of class imbalance in classification performance metrics based on the binary confusion matrix,” Pattern Recognit., 2019, doi: 10.1016/j.patcog.2019.02.023.

J. Attenberg dan Ş. Ertekin, “Class imbalance and active learning,” in Imbalanced Learning: Foundations, Algorithms, and Applications, 2013.

X. Y. Liu dan Z. H. Zhou, “Ensemble methods for class imbalance learning,” Imbalanced Learn. Found. Algorithms, Appl., hal. 61–82, 2013, doi: 10.1002/9781118646106.ch4.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Integrasi SMOTE pada Naive Bayes dan Logistic Regression Berbasis Particle Swarm Optimization untuk Prediksi Cacat Perangkat Lunak


  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

STMIK Budi Darma
Sekretariat : Jln. Sisingamangaraja No. 338 Telp 061-7875998
email :

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.