Penerapan Metode Machine Learning Dan Teknik SMOTE untuk Prediksi Diabetes
DOI:
https://doi.org/10.30865/json.v7i2.9032Keywords:
Knowledge Discovery; Prediksi diabetes; Pembelajaran Mesin; Random Forest; XGBoost; SMOTEAbstract
Diabetes merupakan salah satu penyakit tidak menular yang prevalensinya terus meningkat secara global maupun nasional. Kondisi ini menimbulkan risiko komplikasi serius seperti penyakit jantung, stroke, hingga gagal ginjal apabila tidak terdeteksi sejak dini. Oleh karena itu, dibutuhkan metode prediksi berbasis data yang mampu membantu proses deteksi awal secara cepat, akurat, dan efisien. Penelitian ini bertujuan membandingkan kinerja empat algoritma pembelajaran mesin, yaitu Random Forest, XGBoost, Support Vector Machine (SVM), dan K-Nearest Neighbor (KNN) dalam memprediksi penyakit diabetes menggunakan dataset publik dari Kaggle. Penelitian dilakukan dengan mengacu pada kerangka Knowledge Discovery in Databases (KDD) yang terdiri dari tahapan seleksi data, pra-pemrosesan (data cleaning, transformasi, dan normalisasi), penyeimbangan kelas menggunakan Synthetic Minority Over-sampling Technique (SMOTE), pembagian data latih dan data uji dengan rasio 80:20, implementasi algoritma, serta evaluasi performa model. Evaluasi dilakukan menggunakan metrik Accuracy, Precision, Recall, dan F1-Score untuk memastikan kualitas prediksi secara menyeluruh. Hasil penelitian menunjukkan bahwa Random Forest dan XGBoost memberikan performa terbaik dengan nilai Accuracy, Precision, Recall, dan F1-Score sebesar 0,97. Model KNN menunjukkan performa cukup baik dengan skor 0,94, sementara SVM memperoleh nilai terendah sebesar 0,89. Temuan ini menegaskan bahwa penerapan kerangka KDD dengan teknik SMOTE mampu menghasilkan model prediksi yang optimal. Random Forest dan XGBoost direkomendasikan sebagai algoritma unggulan pada penelitian serupa, terutama pada dataset dengan karakteristik kelas yang tidak seimbang.
References
R. P. Febrinasari, T. A. Sholikah, D. N. Pakha, and S. E. Putra, BUKU SAKU DIABETES MELITUS UNTUK AWAM, 1st ed., vol. 1. Surakarta, Jawa Tengah : Penerbitan dan Pencetakan UNS (UNS Press) , 2020.
International Diabetes Federation, “Fakta & angka,” https://journal.mediapublikasi.id/index.php/logic/article/view/4963.
H. Aulianah and H. Meylina, “Babul Ilmi_Jurnal Ilmiah Multi Science Kesehatan,” vol. 14, no. 2, pp. 161–171, 2022, [Online]. Available: https://jurnal.stikes-aisyiyah-palembang.ac.id/index.php/Kep/article/view/
R. Fan, N. Zhang, L. Yang, J. Ke, D. Zhao, and Q. Cui, “AI-based prediction for the risk of coronary heart disease among patients with type 2 diabetes mellitus,” Sci Rep, vol. 10, no. 1, Dec. 2020, doi: 10.1038/s41598-020-71321-2.
A. Brahmandjati, A. Mizwar, A. Rahim, and F. Asharudin, “Optimasi Prediksi Diabetes Dengan Algoritma XGBoost Dan Teknik Preprocessing Data,” Yogyakarta, Dec. 2024. doi: https://doi.org/10.47065/bits.v6i3.6110.
A. Farisi and A. Homaidi, “Prediksi Penyakit Diabetes Menggunakan Algoritma Support Vector Machine (SVM),” Jurnal Teknologi dan Manajemen Industri Terapan (JTMIT), vol. 4, no. 3, pp. 612–621, 2025.
N. Devian et al., “PREDIKSI PENYAKIT DIABETES DENGAN METODE K-NEAREST NEIGHBOR (KNN) DAN SELEKSI FITUR INFORMATION GAIN,” 2024.
R. Rastogi and M. Bansal, “Diabetes prediction model using data mining techniques,” Measurement: Sensors, vol. 25, Feb. 2023, doi: 10.1016/j.measen.2022.100605.
R. R. Pradana and Y. P. Astuti, “Perbandingan Kinerja Metode Naïve Bayes dan Random Forest untuk Klasifikasi Penyakit Diabetes Berdasarkan Data Medis,” Technology and Science (BITS), vol. 7, no. 1, 2025, doi: 10.47065/bits.v7i1.7446.
Z. Amri, Muhammad Rodi, M. Nurul Wathani, Amir Bagja, and Zulkipli, “Prediksi Diabetes Menggunakan Algoritma K-Nearest (KNN) Teknik SMOTE-ENN,” Infotek: Jurnal Informatika dan Teknologi, vol. 8, no. 1, pp. 193–204, Jan. 2025, doi: 10.29408/jit.v8i1.27975.
Akbar Febrian Dwi Hastono, Anik Vega Vitianingsih, Pamudi Pamudi, Anastasia Lidya Maukar, and Seftin Fitri Ana Wati, “Diabetes Mellitus Disease Prediction Using Logistic Regression (LR) and Support Vector Machine (SVM) Methods,” Decode: Jurnal Pendidikan Teknologi Informasi, vol. 5, no. 1, pp. 54–64, Mar. 2025, doi: 10.51454/decode.v5i1.1039.
M. D. Pratiwi and K. D. Tania, “Knowledge Discovery Through Topic Modeling on GoPartner User Reviews Using BERTopic, LDA, and NMF,” Journal of Applied Informatics and Computing, vol. 9, no. 1, pp. 1–7, Jan. 2025, doi: 10.30871/jaic.v9i1.8782.
N. A. Sofiah, K. D. Tania, A. Meiriza, and A. Wedhasmara, “A Comparative Assessment SARIMA and LSTM Models for the Gurugram Air Quality Index’s Knowledge Discovery,” in 2024 International Conference on Electrical Engineering and Computer Science (ICECOS), IEEE, Sep. 2024, pp. 26–31. doi: 10.1109/ICECOS63900.2024.10791243.
V. Novalia, K. Ditha Tania, A. Meiriza, and A. Wedhasmara, “Knowledge Discovery of Application Review Using Word Embedding’s Comparison with CNN-LSTM Model on Sentiment Analysis,” in 2024 International Conference on Electrical Engineering and Computer Science (ICECOS), IEEE, Sep. 2024, pp. 234–238. doi: 10.1109/ICECOS63900.2024.10791113.
S. Eka, A. Buananta, and A. Chowanda, “BI DASHBOARD TO SUPPORT DECISION MAKING ON PRODUCT PROMOTION FOR PAYMENT/PURCHASE TRANSACTIONS ON E-BANKING,” J Theor Appl Inf Technol, vol. 15, no. 15, 2021, [Online]. Available: www.jatit.org
N. Putu, A. Widiari, M. Agus, D. Suarjaya, and D. Putra Githa, “Teknik Data Cleaning Menggunakan Snowflake untuk Studi Kasus Objek Pariwisata di Bali.”
S. K. Dirjen et al., “Terakreditasi SINTA Peringkat 2 Analisis Pengaruh Data Scaling Terhadap Performa Algoritme Machine Learning untuk Identifikasi Tanaman,” masa berlaku mulai, vol. 1, no. 3, pp. 117–122, 2020.
M. Ihksan, H. Susilo, N. Abdillah, and S. S. Saintika, “PENERAPAN DATA MINING K-MEANS CLUSTERING KEBUTUHAN OBAT DI KLINIK MEDIKA SAINTIKA,” Jurnal Kesehatan Medika Saintika Juni 2023 |Vol 14 Nomor, vol. 14, no. 1, p. 394, 2023, doi: 10.30633/jkms.v14i1.2581.
N. G. Ramadhan and F. D. Adhinata, “TEKNIK SMOTE DAN GINI SCORE DALAM KLASIFIKASI KANKER PAYUDARA,” RADIAL : Jurnal Peradaban Sains, Rekayasa dan Teknologi, vol. 9, no. 2, pp. 125–134, Dec. 2021, doi: 10.37971/radial.v9i2.229.
C. Cahyaningtyas, Y. Nataliani, and I. R. Widiasari, “Analisis sentimen pada rating aplikasi Shopee menggunakan metode Decision Tree berbasis SMOTE,” AITI: Jurnal Teknologi Informasi, vol. 18, no. Agustus, pp. 173–184, 2021.
I. Setiawan, I. Fatah Yasin, Y. Tri Desianti, P. Studi Sistem Dan Teknologi Informasi, F. Sains Dan Teknologi, and A. Surakarta, “Komparasi Kinerja Algoritma Random Forest, Decision Tree, Naïve Bayes, dan KNN dalam Prediksi Tingkat Depresi Mahasiswa Menggunakan Student Depression Dataset,” 2025. [Online]. Available: http://creativecommons.org/licences/by/4.0/
L. R. Sitompul, A. A. Nababan, M. L. Manihuruk, W. A. Ponsen, and S. Supriyandi, “Comparison of Xgboost, Random Forest and Logistic Regression Algorithms in Stroke Disease Classification,” Sinkron, vol. 9, no. 2, pp. 957–968, Jun. 2025, doi: 10.33395/sinkron.v9i2.14794.
P. Sidik, I. Made, G. Sunarya, I. Gede, and A. Gunadi, “Comparison of Random Forest and Support Vector Machine Methods in Sentiment Analysis of Student Satisfaction Questionnaire Comments at ITB STIKOM Bali,” 2025. [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC
X. Deng, H. Shao, L. Shi, X. Wang, and T. Xie, “A classification–detection approach of COVID-19 based on chest X-ray and CT by using keras pre-trained deep learning models,” CMES - Computer Modeling in Engineering and Sciences, vol. 125, no. 2, pp. 579–596, 2020, doi: 10.32604/cmes.2020.011920.
M. Fadli and R. A. Saputra, “KLASIFIKASI DAN EVALUASI PERFORMA MODEL RANDOM FOREST UNTUK PREDIKSI STROKE Classification And Evaluation Of Performance Models Random Forest For Stroke Prediction,” vol. 12, [Online]. Available: http://jurnal.umt.ac.id/index.php/jt/index
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Jurnal Sistem Komputer dan Informatika (JSON)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).

