Penerapan SMOTE untuk Meningkatan Kinerja Klasifikasi Penilaian Kredit
DOI:
https://doi.org/10.30865/jurikom.v10i1.5612Keywords:
Credit scoring, SMOTE, Random Forest, KNN, SVM, MLPAbstract
Machine learning techniques are widely used in various fields and data is needed to train models. However, the distribution of classes in most real-world datasets turns out to be not always balanced, and can be very imbalanced. If the data is imbalanced, the performance of the classifier is highly dependent on the majority class, causing problems in determining performance. One technique that can be applied to balance the data is the Synthetic Minority Oversampling Technique (SMOTE). SMOTE is applied to credit scoring using the German Credit Data (GCD) dataset, and then classified using four classification methods, namely: random forest, K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Multilayer Perceptron (MLP). The performance measure of implementing SMOTE in each classifier method is measured using: recall, precision, F1-Score, and AUC. Accuracy values are also measured to see if the accuracy is suitable for calculating performance on imbalanced datasets. Based on performance measures: recall, precision, F1-Score, and AUC, then applying SMOTE to the dataset and then classifying it using four methods shows an increase in performance. The highest performance measure: recall = 82.00% with the random forest method, precision = 75.35 with the MLP method, F1-Score = 76.93% with the MLP method, and AUC = 0.832 with the random forest method. The accuracy value after SMOTE slightly decreased in the random forest, KNN, and SVM methods, while with MLP the accuracy value increased slightly. The contribution of this research is to show the need for imbalanced data handling to improve the performance of classifier algorithms, especially for credit rating datasets.
References
E. Sudarmaji and S. Ambarwati, “Pembelajaran Mesin untuk Menilai Kelayakan Kredit Proyek Retrofit : Multinomial Logit,†no. September, 2022, doi: 10.24123/jati.v15i2.4912.
M. Anis and M. Ali, “Investigating the Performance of Smote for Class Imbalanced Learning : A Case Study of Credit Scoring Datasets,†Eur. Sci. J., vol. 13, no. 33, pp. 340–353, 2017, doi: 10.19044/esj.2017.v13n33p340.
C. C. Tusell-rey, O. Camacho-nieto, and Y. Cornelio, “Customized Instance Random Undersampling to Increase Knowledge Management for Multiclass Imbalanced Data Classification,†2022.
L. Qadrini, Hikmah, and Megasari, “Oversampling , Undersampling , Smote SVM dan Random Forest pada Klasifikasi Penerima Bidikmisi Sejawa Timur Tahun 2017,†vol. 3, no. 4, pp. 386–391, 2022, doi: 10.47065/josyc.v3i4.2154.
W. Chaipanha and P. Kaewwichian, “SMOTE VS . Random Undersampling for Imbalanced Data- Car Ownership Demand Model,†no. March, 2022, doi: 10.26552/com.C.2022.3.D105-D115.
X. Dastile, T. Celik, H. Vandierendonck, and S. Member, “Model-Agnostic Counterfactual Explanations in Credit Scoring,†IEEE Access, vol. 10, no. April, pp. 69543–69554, 2022, doi: 10.1109/ACCESS.2022.3177783.
X. Sika and R. Amelia, “Klasifikasi Kelayakan Pemberian Kredit Pada Calon Debitur Menggunakan Naïve Bayes,†vol. 9, no. 6, pp. 1833–1839, 2022, doi: 10.30865/jurikom.v9i6.5131.
C. Wang and Z. Xiao, “applied sciences A Deep Learning Approach for Credit Scoring Using Feature Embedded Transformer,†2022.
D. W. Triscowati and L. D. Jayanti, “Penilaian Kredit Pada Data Tak Seimbang Menggunakan Random Forest Credit Scoring In Imbalance Data Using Random Forest,†vol. 1, pp. 25–31, 2021.
H. B. Nguyen and V. Huynh, “On Sampling Techniques for Corporate Credit Scoring,†J. Adv. Comput. Intell. Intell. Informatics, vol. 24, no. 1, 2020.
S. R. Lenka, S. K. Bisoy, R. Priyadarshini, and M. Sain, “Review Article Empirical Analysis of Ensemble Learning for Imbalanced Credit Scoring Datasets : A Systematic Review,†Wirel. Commun. Mob. Comput., vol. 2022, 2022.
H. Li et al., “A novel method for credit scoring based on feature transformation and ensemble model,†PeerJ Comput. Sci., pp. 1–18, 2021, doi: 10.7717/peerj-cs.579.
A. Almustfa, H. Adam, and M. Bee, “Machine Learning Models and Data-Balancing Techniques for Credit Scoring : What Is the Best Combination ?,†Risk, vol. 10, no. 169, 2022.
N. V Chawla, K. W. Bowyer, and L. O. Hall, “SMOTE : Synthetic Minority Over-sampling Technique,†J. Artif. Intell. Res., vol. 16, pp. 321–357, 2002.
A. Fernandez, S. Garcia, F. Herrera, and N. V Chawla, “SMOTE for Learning from Imbalanced Data : Progress and Challenges , Marking the 15-year Anniversary,†J. Artif. Intell. Res., vol. 61, pp. 863–905, 2018.
M. Deng, Y. G. Id, C. Wang, and F. Wu, “An oversampling method for multi-class imbalanced data based on composite weights,†PLoS One, vol. 16, no. 11, pp. 1–15, 2021, doi: 10.1371/journal.pone.0259227.
D. Dua and C. Graff, “No Title,†UCI Mach. Learn. Repos., 2019.
N. Darapureddy, N. Karatapu, and T. K. Battula, “Research of Machine Learning Algorithms using K-Fold Cross Validation,†Int. J. Eng. Adv. Technol., vol. 8, no. 6, pp. 215–218, 2019, doi: 10.35940/ijeat.F1043.0886S19.
I. K. Nti, “Performance of Machine Learning Algorithms with Different K Values in K-fold Cross- Validation,†J. Inf. Technol. Comput. Sci., vol. 6, pp. 61–71, 2021, doi: 10.5815/ijitcs.2021.06.05.
U. Erdiansyah, A. I. Lubis, and K. Erwansyah, “Komparasi Metode K-Nearest Neighbor dan Random Forest Dalam Prediksi Akurasi Klasifikasi Pengobatan Penyakit Kutil,†J. MEDIA Inform. BUDIDARMA, vol. 6, pp. 208–214, 2022, doi: 10.30865/mib.v6i1.3373.
I. F. Yuliati and P. R. Sihombing, “Penerapan Metode Machine Learning dalam Klasifikasi Risiko Kejadian Berat Badan Lahir Rendah di Indonesia Implementation of Machine Learning Method in Risk Classification on Low Birth weight in Indonesia,†Matrik J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 20, no. 2, pp. 417–426, 2021, doi: 10.30812/matrik.v20i2.1174.
M. L. Suliztia and A. Fauzan, “Comparing Naive Bayes , K-Nearest Neighbor , And Neural Network Classification Methods Of Seat Load Factor In Lombok Outbound Flights,†J. Mat. Stat. Komputasi, vol. 16, no. 2, pp. 187–198, 2020, doi: 10.20956/jmsk.v.
N. H. Ovirianti, M. Zarlis, and H. Mawengkang, “Support Vector Machine Using A Classification Algorithm,†Sink. J. dan Penelit. Tek. Inform., vol. 7, no. 3, pp. 2103–2107, 2022.
A. A. Almazloum, A.-R. Al-Hinnawi, and R. De Fazio, “Assessment of Multi-Layer Perceptron Neural Network for Pulmonary Function Test ’ s Diagnosis Using ATS and ERS Respiratory Standard Parameters,†Computers, vol. 11, 2022.
R. Indransyah et al., “Klasifikasi Sentimen Pergelaran MotoGP di Indonesia Menggunakan Algoritma Correlated Naïve Bayes Clasifier,†pp. 60–66, 2022.
A. Nabillah, S. Alam, and M. G. Resmi, “Twitter User Sentiment Analysis Of TIX ID Applications Using Support Vector Machine Algorithm,†vol. 3, no. 1, pp. 14–27, 2022.



