Penerapan SMOTE untuk Meningkatan Kinerja Klasifikasi Penilaian Kredit

Muhammad Ibnu Choldun Rachmatullah

doi:10.30865/jurikom.v10i1.5612

Authors

Muhammad Ibnu Choldun Rachmatullah Universitas Logistik dan Bisnis Internasional, Bandung

DOI:

https://doi.org/10.30865/jurikom.v10i1.5612

Keywords:

Credit scoring, SMOTE, Random Forest, KNN, SVM, MLP

Abstract

Machine learning techniques are widely used in various fields and data is needed to train models. However, the distribution of classes in most real-world datasets turns out to be not always balanced, and can be very imbalanced. If the data is imbalanced, the performance of the classifier is highly dependent on the majority class, causing problems in determining performance. One technique that can be applied to balance the data is the Synthetic Minority Oversampling Technique (SMOTE). SMOTE is applied to credit scoring using the German Credit Data (GCD) dataset, and then classified using four classification methods, namely: random forest, K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Multilayer Perceptron (MLP). The performance measure of implementing SMOTE in each classifier method is measured using: recall, precision, F1-Score, and AUC. Accuracy values are also measured to see if the accuracy is suitable for calculating performance on imbalanced datasets. Based on performance measures: recall, precision, F1-Score, and AUC, then applying SMOTE to the dataset and then classifying it using four methods shows an increase in performance. The highest performance measure: recall = 82.00% with the random forest method, precision = 75.35 with the MLP method, F1-Score = 76.93% with the MLP method, and AUC = 0.832 with the random forest method. The accuracy value after SMOTE slightly decreased in the random forest, KNN, and SVM methods, while with MLP the accuracy value increased slightly. The contribution of this research is to show the need for imbalanced data handling to improve the performance of classifier algorithms, especially for credit rating datasets.

References

E. Sudarmaji and S. Ambarwati, â€œPembelajaran Mesin untuk Menilai Kelayakan Kredit Proyek Retrofit : Multinomial Logit,â€ no. September, 2022, doi: 10.24123/jati.v15i2.4912.

M. Anis and M. Ali, â€œInvestigating the Performance of Smote for Class Imbalanced Learning : A Case Study of Credit Scoring Datasets,â€ Eur. Sci. J., vol. 13, no. 33, pp. 340â€“353, 2017, doi: 10.19044/esj.2017.v13n33p340.

C. C. Tusell-rey, O. Camacho-nieto, and Y. Cornelio, â€œCustomized Instance Random Undersampling to Increase Knowledge Management for Multiclass Imbalanced Data Classification,â€ 2022.

L. Qadrini, Hikmah, and Megasari, â€œOversampling , Undersampling , Smote SVM dan Random Forest pada Klasifikasi Penerima Bidikmisi Sejawa Timur Tahun 2017,â€ vol. 3, no. 4, pp. 386â€“391, 2022, doi: 10.47065/josyc.v3i4.2154.

W. Chaipanha and P. Kaewwichian, â€œSMOTE VS . Random Undersampling for Imbalanced Data- Car Ownership Demand Model,â€ no. March, 2022, doi: 10.26552/com.C.2022.3.D105-D115.

X. Dastile, T. Celik, H. Vandierendonck, and S. Member, â€œModel-Agnostic Counterfactual Explanations in Credit Scoring,â€ IEEE Access, vol. 10, no. April, pp. 69543â€“69554, 2022, doi: 10.1109/ACCESS.2022.3177783.

X. Sika and R. Amelia, â€œKlasifikasi Kelayakan Pemberian Kredit Pada Calon Debitur Menggunakan NaÃ¯ve Bayes,â€ vol. 9, no. 6, pp. 1833â€“1839, 2022, doi: 10.30865/jurikom.v9i6.5131.

C. Wang and Z. Xiao, â€œapplied sciences A Deep Learning Approach for Credit Scoring Using Feature Embedded Transformer,â€ 2022.

D. W. Triscowati and L. D. Jayanti, â€œPenilaian Kredit Pada Data Tak Seimbang Menggunakan Random Forest Credit Scoring In Imbalance Data Using Random Forest,â€ vol. 1, pp. 25â€“31, 2021.

H. B. Nguyen and V. Huynh, â€œOn Sampling Techniques for Corporate Credit Scoring,â€ J. Adv. Comput. Intell. Intell. Informatics, vol. 24, no. 1, 2020.

S. R. Lenka, S. K. Bisoy, R. Priyadarshini, and M. Sain, â€œReview Article Empirical Analysis of Ensemble Learning for Imbalanced Credit Scoring Datasets : A Systematic Review,â€ Wirel. Commun. Mob. Comput., vol. 2022, 2022.

H. Li et al., â€œA novel method for credit scoring based on feature transformation and ensemble model,â€ PeerJ Comput. Sci., pp. 1â€“18, 2021, doi: 10.7717/peerj-cs.579.

A. Almustfa, H. Adam, and M. Bee, â€œMachine Learning Models and Data-Balancing Techniques for Credit Scoring : What Is the Best Combination ?,â€ Risk, vol. 10, no. 169, 2022.

N. V Chawla, K. W. Bowyer, and L. O. Hall, â€œSMOTE : Synthetic Minority Over-sampling Technique,â€ J. Artif. Intell. Res., vol. 16, pp. 321â€“357, 2002.

A. Fernandez, S. Garcia, F. Herrera, and N. V Chawla, â€œSMOTE for Learning from Imbalanced Data : Progress and Challenges , Marking the 15-year Anniversary,â€ J. Artif. Intell. Res., vol. 61, pp. 863â€“905, 2018.

M. Deng, Y. G. Id, C. Wang, and F. Wu, â€œAn oversampling method for multi-class imbalanced data based on composite weights,â€ PLoS One, vol. 16, no. 11, pp. 1â€“15, 2021, doi: 10.1371/journal.pone.0259227.

D. Dua and C. Graff, â€œNo Title,â€ UCI Mach. Learn. Repos., 2019.

N. Darapureddy, N. Karatapu, and T. K. Battula, â€œResearch of Machine Learning Algorithms using K-Fold Cross Validation,â€ Int. J. Eng. Adv. Technol., vol. 8, no. 6, pp. 215â€“218, 2019, doi: 10.35940/ijeat.F1043.0886S19.

I. K. Nti, â€œPerformance of Machine Learning Algorithms with Different K Values in K-fold Cross- Validation,â€ J. Inf. Technol. Comput. Sci., vol. 6, pp. 61â€“71, 2021, doi: 10.5815/ijitcs.2021.06.05.

U. Erdiansyah, A. I. Lubis, and K. Erwansyah, â€œKomparasi Metode K-Nearest Neighbor dan Random Forest Dalam Prediksi Akurasi Klasifikasi Pengobatan Penyakit Kutil,â€ J. MEDIA Inform. BUDIDARMA, vol. 6, pp. 208â€“214, 2022, doi: 10.30865/mib.v6i1.3373.

I. F. Yuliati and P. R. Sihombing, â€œPenerapan Metode Machine Learning dalam Klasifikasi Risiko Kejadian Berat Badan Lahir Rendah di Indonesia Implementation of Machine Learning Method in Risk Classification on Low Birth weight in Indonesia,â€ Matrik J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 20, no. 2, pp. 417â€“426, 2021, doi: 10.30812/matrik.v20i2.1174.

M. L. Suliztia and A. Fauzan, â€œComparing Naive Bayes , K-Nearest Neighbor , And Neural Network Classification Methods Of Seat Load Factor In Lombok Outbound Flights,â€ J. Mat. Stat. Komputasi, vol. 16, no. 2, pp. 187â€“198, 2020, doi: 10.20956/jmsk.v.

N. H. Ovirianti, M. Zarlis, and H. Mawengkang, â€œSupport Vector Machine Using A Classification Algorithm,â€ Sink. J. dan Penelit. Tek. Inform., vol. 7, no. 3, pp. 2103â€“2107, 2022.

A. A. Almazloum, A.-R. Al-Hinnawi, and R. De Fazio, â€œAssessment of Multi-Layer Perceptron Neural Network for Pulmonary Function Test â€™ s Diagnosis Using ATS and ERS Respiratory Standard Parameters,â€ Computers, vol. 11, 2022.

R. Indransyah et al., â€œKlasifikasi Sentimen Pergelaran MotoGP di Indonesia Menggunakan Algoritma Correlated NaÃ¯ve Bayes Clasifier,â€ pp. 60â€“66, 2022.

A. Nabillah, S. Alam, and M. G. Resmi, â€œTwitter User Sentiment Analysis Of TIX ID Applications Using Support Vector Machine Algorithm,â€ vol. 3, no. 1, pp. 14â€“27, 2022.

Penerapan SMOTE untuk Meningkatan Kinerja Klasifikasi Penilaian Kredit

Authors

DOI:

Keywords:

Abstract

References

Additional Files

Published

How to Cite

Issue

Section

menujuribaru

template

sitasigs

member

Keywords