Optimasi Performa Random Forest dengan Random Oversampling dan SMOTE pada Dataset Diabetes

 (*)Hasbi Hasbi Mail (Universitas Amikom Yogyakarta, Yogyakarta, Indonesia)
 Theopilus Bayu Sasongko (Universitas Amikom Yogyakarta, Yogyakarta, Indonesia)

(*) Corresponding Author

Submitted: June 28, 2024; Published: July 27, 2024

Abstract

Diabetes, or high blood sugar, is a chronic condition that needs careful monitoring. If left untreated, it can lead to severe complications. This research aims to accurately diagnose diabetes, addressing the issue of class imbalance in the dataset, which can affect the model's classification accuracy. The goal is to improve diabetes classification accuracy using balancing methods, specifically the Synthetic Minority Over-sampling Technique (SMOTE) and Random Oversampling. These methods are applied to data from patients diagnosed with diabetes and those who do not have the disease.The initial step in the research involved addressing class imbalance by applying SMOTE and random oversampling to generate synthetic samples for the minority class. This was followed by data normalization using the min-max normalization method. Subsequently, the Random Forest Classifier was used to train the model for classification. The results demonstrate that this approach enhances the model's ability to identify diabetes cases, achieving an accuracy of 96%. This represents a 1% improvement over the accuracy of 95% reported in previous research.

Keywords


Random Forest; Random Oversampling; SMOTE; Diabetes

Full Text:

PDF


Article Metrics

Abstract view : 182 times
PDF - 108 times

References

R. M. Fadhli, “Pengertian Diabetes (Gula Darah Tinggi).”

P2ptm Kemenkes Ri, “Penderita Diabetes Di Indonesia Dapat Mencapai 30 Juta Orang Pada 2030 Mendatang Bila Gaya Hidup Termasuk Makan Banyak Dan Merokok Tidak Dikurangi.”

Pittara, “Diabetes.”

P2ptm Kemenkes Ri, “Tanda Dan Gejala Diabetes.”

H. Apriyani, “Perbandingan Metode Naïve Bayes Dan Support Vector Machine Dalam Klasifikasi Penyakit Diabetes Melitus,” 2020. [Online]. Available: Https://Journal-Computing.Org/Index.Php/Journal-Ita/Index

A. M. Argina, “Indonesian Journal Of Data And Science Penerapan Metode Klasifikasi K-Nearest Neigbor Pada Dataset Penderita Penyakit Diabetes,” Vol. 1, No. 2, Pp. 29–33, 2020.

M. Ali, B. Soejono Wiriaatmadja, And A. D. Hartanto, Klasifikasi Pasien Pengidap Diabetes Menggunakan Neural Network Backpropagation Untuk Prediksi Kesembuhan. 2020.

D. Sri Rahayu, J. Afifah, And S. Intan, “Classification Of Diabetes Mellitus Using C4.5 Algorithm, Support Vector Machine (Svm) And Linear Regression Klasifikasi Penyakit Diabetes Melitus Menggunakan Algoritma C4.5, Support Vector Machine (Svm) Dan Regresi Linear,” 2023. [Online]. Available: Https://Journal.Irpi.Or.Id/Index.Php/Sentimas

A. Mulyo Widodo, Y. Salsabila Anggraeni, N. Anwar, A. Ichwani, And B. Anggara Sekti, “Performansi K-Nn, J48, Naive Bayes Dan Regresi Logistik Sebagai Algoritma Pengklasifikasi Diabetes,” 2021.

A. M. A. Rahim, Inggrid Yanuar Risca Pratiwi, And Muhammad Ainul Fikri, “Klasifikasi Penyakit Jantung Menggunakan Metode Synthetic Minority Over-Sampling Technique Dan Random Forest Clasifier,” Indonesian Journal Of Computer Science, Vol. 12, No. 5, Nov. 2023, Doi: 10.33022/Ijcs.V12i5.3413.

Anjar Setiawan, Ema Utami, And Dhani Ariatmanto, “Cattle Weight Estimation Using Linear Regression And Random Forest Regressor,” Jurnal Resti (Rekayasa Sistem Dan Teknologi Informasi), Vol. 8, No. 1, Pp. 72–79, Feb. 2024, Doi: 10.29207/Resti.V8i1.5494.

D. D. Dewi, N. Qisthi, S. Sarah, S. Lestari, Z. Hidayah, And S. Putri, “Perbandingan Metode Neural Network Dan Support Vector Machine Dalam Klasifikasi Diagnosa Penyakit Diabetes,” Jurnal Ilmiah Indonesia, Vol. 3, No. 9, Pp. 828–839, 2023, Doi: 10.36418/Cerdika.Xxx.

P. D. Rinanda, B. Delvika, S. Nurhidayarnis, N. Abror, And A. Hidayat, “Perbandingan Klasifikasi Antara Naive Bayes Dan K-Nearest Neighbor Terhadap Resiko Diabetes Pada Ibu Hamil,” Malcom: Indonesian Journal Of Machine Learning And Computer Science, Vol. 2, No. 2, Pp. 68–75, Sep. 2022, Doi: 10.57152/Malcom.V2i2.432.

M. Rifqi Maulana, M. Faizal Kurniawan, And M. Adib Al Karomi, “Komparasi Algoritma Data Mining Untuk Klasifikasi Penyakit Diabetes,” Media Online, Vol. 3, No. 5, Pp. 343–350, 2023, Doi: 10.47065/Bulletincsr.V3i5.280.

M. E. Febrian, F. X. Ferdinan, G. P. Sendani, K. M. Suryanigrum, And R. Yunanda, “Diabetes Prediction Using Supervised Machine Learning,” In Procedia Computer Science, Elsevier B.V., 2022, Pp. 21–30. Doi: 10.1016/J.Procs.2022.12.107.

Jack W. Smith, J.E. Everhart, W.C. Dickson, W.C. Knowler, And R.S. Johannes, “Using The Adap Learning Algorithm To Forecast The Onset Of Diabetes Mellitus,” 1988, Accessed: Jul. 27, 2024. [Online]. Available: Https://Www.Ncbi.Nlm.Nih.Gov/Pmc/Articles/Pmc2245318/

V. Werner De Vargas, J. A. Schneider Aranda, R. Dos Santos Costa, P. R. Da Silva Pereira, And J. L. Victória Barbosa, “Imbalanced Data Preprocessing Techniques For Machine Learning: A Systematic Mapping Study,” Knowl Inf Syst, Vol. 65, No. 1, Pp. 31–57, Jan. 2023, Doi: 10.1007/S10115-022-01772-8.

R. A. Maula Et Al., “Handling Missing Value Dengan Pendekatan Regresi Pada Dataset Akuakultur Berukuran Kecil,” Jurnal Rekayasa Elektrika, Vol. 18, No. 3, Pp. 175–184, 2022, Doi: 10.17529/Jre.V18i3.25903.

D. Singh And B. Singh, “Investigating The Impact Of Data Normalization On Classification Performance,” Appl Soft Comput, Vol. 97, P. 105524, Dec. 2020, Doi: 10.1016/J.Asoc.2019.105524.

Abd Mizwar A Rahim, Andi Sunyoto, And Muhammad Rudyanto Arief, “Stroke Prediction Using Machine Learning Method With Extreme Gradient Boosting Algorithm,” Matrik: Jurnal Manajemen, Teknik Informatika, Dan Rekayasa Komputer, 2022.

L. Qadrini, H. Hikmah, And M. Megasari, “Oversampling, Undersampling, Smote Svm Dan Random Forest Pada Klasifikasi Penerima Bidikmisi Sejawa Timur Tahun 2017,” Journal Of Computer System And Informatics (Josyc), Vol. 3, No. 4, Pp. 386–391, Sep. 2022, Doi: 10.47065/Josyc.V3i4.2154.

E. Saputro And D. Rosiyadi, “Penerapan Metode Random Over-Under Sampling Pada Algoritma Klasifikasi Penentuan Penyakit Diabetes,” Vol. 10, No. 1, 2022, [Online]. Available: Https://Archive.Ics.Uci.Edu/Ml/Machine-Learning-

S. Kuter, “Completing The Machine Learning Saga In Fractional Snow Cover Estimation From Modis Terra Reflectance Data: Random Forests Versus Support Vector Regression,” Remote Sens Environ, Vol. 255, Mar. 2021, Doi: 10.1016/J.Rse.2021.112294.

Kuncahyo Setyo Nugroho, “Confusion Matrix Untuk Evaluasi Model Pada Supervised Learning.” Accessed: Jul. 27, 2024. [Online]. Available: Https://Ksnugroho.Medium.Com/Confusion-Matrix-Untuk-Evaluasi-Model-Pada-Unsupervised-Machine-Learning-Bc4b1ae9ae3f

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 JURNAL MEDIA INFORMATIKA BUDIDARMA

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.



JURNAL MEDIA INFORMATIKA BUDIDARMA
STMIK Budi Darma
Secretariat: Sisingamangaraja No. 338 Telp 061-7875998
Email: mib.stmikbd@gmail.com

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.