Analisis Perbandingan Metode Random Forest, XGBoost, dan Logistic Regression Untuk Klasifikasi Deteksi Dini Penyakit Diabetes
DOI:
https://doi.org/10.30865/jurikom.v12i6.9392Keywords:
Diabetes, XGBoost, Random Forest, Logistic Regression, SMOTEAbstract
Diabetes Mellitus is a chronic disease with a continuously increasing prevalence, posing serious challenges to public health and contributing significantly to the global economic burden. The often non-specific nature of early symptoms increases the risk of delayed diagnosis, highlighting the need for accurate early detection approaches to support clinical decision-making. This study aims to analyze and compare the performance of three machine learning algorithms Logistic Regression, Random Forest, and XGBoost in classifying diabetes risk based on several clinical parameters, including age, body mass index (BMI), blood pressure, glucose level, and HbA1c. The dataset used in this research was obtained from the Diabetes Prediction Dataset, consisting of 100,000 records. The research process involved handling missing data, applying One-Hot Encoding to categorical variables, normalizing numerical features, and addressing class imbalance using the Synthetic Minority Over-sampling Technique (SMOTE). Model performance was evaluated using Accuracy, Precision, Recall, F1-Score, and ROC-AUC metrics to provide a comprehensive assessment. The experimental results indicate that XGBoost achieved the best performance, with an accuracy of 96.88% and a ROC-AUC value of 98.00%. Meanwhile, Random Forest attained an accuracy of 95.68% with an F1-Score of 74.76%, while Logistic Regression recorded an accuracy of 88.96% and the highest recall value of 89.12%. These findings suggest that ensemble learning methods, particularly boosting approaches, are more effective in improving the accuracy of diabetes and non-diabetes classification. The primary contribution of this study lies in providing a multi-metric comparative analysis that can serve as a reference for selecting the most effective machine learning model in the development of medical decision support systems for early diabetes detection.
References
[1] D. Care and S. S. Suppl, “Introduction and Methodology: Standards of Care in Diabetes—2024,” Diabetes Care, vol. 47, no. December 2024, pp. S1–S4, 2024, doi: 10.2337/dc24-SINT.
[2] World Health Organisation, “Quick facts What are the causes/risk factors for diabetes? What are the symptoms of diabetes?,” 2023.
[3] Magliano DJ and Boyko EJ, IDF DIABETES ATLAS [Internet]. 10th edition. 2021. [Online]. Available: https://www.ncbi.nlm.nih.gov/books/NBK581938/
[4] H. Sun et al., “IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045,” Diabetes Res. Clin. Pract., vol. 183, pp. 1–23, 2022, doi: 10.1016/j.diabres.2021.109119.
[5] K. L. Ong et al., “Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021,” Lancet, vol. 402, no. 10397, pp. 203–234, 2023, doi: 10.1016/S0140-6736(23)01301-6.
[6] V. Kontis, J. Bentham, C. D. Mathers, J. Rehm, M. Ezzati, and NCD Risk Factor Collaboration, “Worldwide trends in diabetes prevalence and treatment from 1990 to 2022: a pooled analysis of 1108 population-representative studies with 141 million participants,” Lancet, vol. 404, no. 10467, pp. 2077–2093, 2024, doi: 10.1016/S0140-6736(24)02317-1.Worldwide.
[7] S. K. Indonesia, “Dalam angka,” 2023.
[8] IDF Diabetes Atlas. 2025.
[9] M. Tarigan and E. R. Megawati, “Prediabetes , undiagnosed diabetes , and associated factors in North Sumatra , Indonesia : A community-based study,” vol. 31, no. 4, pp. 420–427, 2024.
[10] F. A. Siregar, T. Makmur, and R. Bestari, “Identifying Adult Population at Risk for Undiagnosed Diabetes Mellitus in Medan City , Indonesia Targeted on Diabetes Prevention,” vol. 77, no. 6, pp. 455–459, 2023, doi: 10.5455/medarh.2023.77.455-459.
[11] D. C. P. Buani, “Deteksi Dini Penyakit Diabetes dengan Menggunakan Algoritma Random Forest,” EVOLUSI J. Sains dan Manaj., vol. 12, no. 1, pp. 1–8, 2024, doi: 10.31294/evolusi.v12i1.21005.
[12] R. Hidayat, D. Mahdiana, and A. Fergina, “Comparative Analysis of Logistic Regression, SVM, Xgboost, and Random Forest Algorithms for Diabetes Classification,” J. Teknol. Sist. Inf. dan Apl., vol. 7, no. 1, pp. 281–291, 2024, doi: 10.32493/jtsi.v7i1.38258.
[13] F. Kurniawati and D. Brahma Arianto, “Analisis Implementasi Seleksi Fitur Pada Klasifikasi Diabetes dengan Metode Corellation Matrix dan Algoritma Logistic Regression,” Inform. J. Ilmu Komput., vol. 19, no. 3, pp. 157–164, 2023, doi: 10.52958/iftk.v19i3.6019.
[14] N. Nurussakinah, M. Faisal, and I. B. Santoso, “Algoritma Random Forest dan Synthetic Minority Oversampling Technique (SMOTE) untuk Deteksi Diabetes,” JISKA (Jurnal Inform. Sunan Kalijaga), vol. 10, no. 2, pp. 221–234, 2025, doi: 10.14421/jiska.2025.10.2.221-234.
[15] N. H. Setyawan and N. Wakhidah, “Analisis Perbandingan Metode Logistic Regression, Random Forest, Gradient Boosting Untuk Prediksi Diabetes,” JIPI (Jurnal Ilm. Penelit. dan Pembelajaran Inform., vol. 10, no. 1, pp. 150–162, 2025, doi: 10.29100/jipi.v10i1.5743.
[16] S. P. Nainggolan and A. Sinaga, “Comparative Analysis of Accuracy of Random Forest and Gradient Boosting Classifier Algorithm for Diabetes Classification,” Sebatik, vol. 27, no. 1, pp. 97–102, 2023, doi: 10.46984/sebatik.v27i1.2157.
[17] K. D. K. Wardhani and M. Akbar, “Diabetes Risk Prediction Using Extreme Gradient Boosting (XGBoost),” J. Online Inform., vol. 7, no. 2, pp. 244–250, 2022, doi: 10.15575/join.v7i2.970.
[18] E. R. Susanto and A. Cahyana, “Penerapan Algoritma XGBoost untuk Prediksi Diabetes: Analisis Confusion Matrix dan ROC Curve,” Fountain Informatics J., vol. 10, no. 1, pp. 40–50, 2025, doi: 10.21111/fij.v10i1.14311.
[19] M. Salsabil, N. Lutvi, and A. Eviyanti, “Implementasi Data Mining Dalam Melakukan Prediksi Penyakit Diabetes Menggunakan Metode Random Forest Dan Xgboost,” J. Ilm. Komputasi, vol. 23, no. 1, pp. 51–58, 2024, doi: 10.32409/jikstik.23.1.3507.
[20] M. Hakeel, “Diabetes Prediction Machine Learning-Based Diabetes Prediction App Using Random Forest Algorithm,” JATI (Jurnal Mhs. Tek. Inform., vol. 9, no. 1, pp. 1370–1376, 2025, doi: 10.36040/jati.v9i1.12654.
[21] R. Sugavanam, S. S. Muralikumar, V. R. Chandran, and S. K. Ponnusamy, “Diabetes Prediction Using Machine Learning,” AIP Conf. Proc., vol. 3279, no. 1, pp. 639–649, 2025, doi: 10.1063/5.0263265.
[22] G. Airlangga, “Comparative Analysis of Machine Learning Models for Predicting Diabetes: Unveiling the Superiority of Advanced Ensemble Methods,” G-Tech J. Teknol. Terap., vol. 8, no. 2, pp. 1272–1280, 2024, doi: 10.33379/gtech.v8i2.4246.
[23] H. Kurniawan, “Evaluasi Performa Random Forest, XGBoost, dan LightGBM dalam Diagnosis Dini Diabetes Mellitus,” J. JUPITER, vol. 17, no. 2, pp. 835–844, 2025.
[24] L. Andrade-Arenas and C. Yactayo-Arias, “Comparative Evaluation of Machine Learning Models for Diabetes Prediction: A Focus on Ensemble Methods,” Ing. des Syst. d’Information, vol. 30, no. 7, pp. 1795–1803, 2025, doi: 10.18280/isi.300712.
[25] N. Katiyar, H. K. Thakur, and A. Ghatak, “Recent advancements using machine learning & deep learning approaches for diabetes detection: a systematic review,” e-Prime - Adv. Electr. Eng. Electron. Energy, vol. 9, no. May, p. 100661, 2024, doi: 10.1016/j.prime.2024.100661.



