Analisis Performansi Model Machine Learning dalam Klasifikasi Penyakit Diabetes Tipe 2

Authors

  • Ryan Hidayatulloh Universitas Dian Nuswantoro, Semarang
  • Wahyu Aji Eko Prabowo Universitas Dian Nuswantoro, Semarang

DOI:

https://doi.org/10.30865/jurikom.v12i4.8747

Keywords:

Diabetes Mellitus, Machine Learning, Random Forest, Disease Prediction, Model Evaluation

Abstract

Type 2 Diabetes Mellitus is a chronic disease that develops gradually and can lead to serious complications—such as heart disease, kidney failure, and blindness—if not detected early. This study aims to evaluate and compare the performance of four machine learning algorithms—Logistic Regression, Random Forest, Multilayer Perceptron, and Deep Neural Network—in predicting the risk of type 2 diabetes based on medical data. The analysis uses the Pima Indians Diabetes dataset, which contains 9.538 patient records and 16 predictor variables. We split the data into training and testing sets using an 80:20 ratio. During training, we performed hyperparameter tuning using Grid Search combined with cross-validation. To evaluate model performance, we applied several metrics, including accuracy, precision, recall, F1-score, Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R², and an analysis of overfitting. The results indicate that the Random Forest model outperformed the others, achieving 100% accuracy, zero classification errors, near-zero prediction error values, and no signs of overfitting. Logistic Regression also performed well, though slightly below the Random Forest. In contrast, the Multilayer Perceptron and Deep Neural Network models showed mild overfitting and higher false negative rates. Based on these findings, we recommend the Random Forest model as the most reliable option for early prediction systems in type 2 diabetes mellitus.

References

H. Yaribeygi, T. Sathyapalan, S. L. Atkin, and A. Sahebkar, “Molecular Mechanisms Linking Oxidative Stress and Diabetes Mellitus,” Oxid. Med. Cell. Longev., vol. 2020, 2020, doi: 10.1155/2020/8609213.

P. Govindarajan, R. K. Soundarapandian, A. H. Gandomi, R. Patan, P. Jayaraman, and R. Manikandan, “Retraction Note: Classification of stroke disease using machine learning algorithms (Neural Computing and Applications, (2020), 32, 3, (817-828), 10.1007/s00521-019-04041-y),” Neural Comput. Appl., vol. 36, no. 17, p. 10379, 2024, doi: 10.1007/s00521-024-09844-2.

P. Parhi, R. Bisoi, and P. K. Dash, “An Integrated Nature-Inspired Algorithm Hybridized Adaptive Broad Learning System for Disease Classification,” IEEE Access, vol. 11, no. March, pp. 31636–31656, 2023, doi: 10.1109/ACCESS.2023.3262167.

J. P. Li, A. U. Haq, S. U. Din, J. Khan, A. Khan, and A. Saboor, “Heart Disease Identification Method Using Machine Learning Classification in E-Healthcare,” IEEE Access, vol. 8, no. Ml, pp. 107562–107582, 2020, doi: 10.1109/ACCESS.2020.3001149.

M. Rizky, A. Pramuntadi, D. Prastowo, and D. Hardan Gutama, “Implementation of Deep Neural Network Method on Classification of Type 2 Diabetes Mellitus Disease Implementasi Metode Deep Neural Network pada Klasifikasi Penyakit Diabetes Melitus Tipe 2,” MALCOM Indones. J. Mach. Learn. Comput. Sci. , vol. 4, no. 3, pp. 1043–1050, 2024.

Erlin, Yulvia Nora Marlim, Junadhi, Laili Suryati, and Nova Agustina, “Deteksi Dini Penyakit Diabetes Menggunakan Machine Learning dengan Algoritma Logistic Regression,” J. Nas. Tek. Elektro dan Teknol. Inf., vol. 11, no. 2, pp. 88–96, 2022, doi: 10.22146/jnteti.v11i2.3586.

Y. Zeng and F. Cheng, “Medical and Health Data Classification Method Based on Machine Learning,” J. Healthc. Eng., vol. 2021, 2021, doi: 10.1155/2021/2722854.

Q. An, S. Rahman, J. Zhou, and J. J. Kang, “A Comprehensive Review on Machine Learning in Healthcare Industry: Classification, Restrictions, Opportunities and Challenges,” Sensors, vol. 23, no. 9, 2023, doi: 10.3390/s23094178.

M. Adnan, A. A. S. Alarood, M. I. Uddin, and I. ur Rehman, “Utilizing grid search cross-validation with adaptive boosting for augmenting performance of machine learning models,” PeerJ Comput. Sci., vol. 8, no. Ml, pp. 1–29, 2022, doi: 10.7717/PEERJ-CS.803.

A. F. D. Putra, M. N. Azmi, H. Wijayanto, S. Utama, and I. G. P. W. Wedashwara Wirawan, “Optimizing Rain Prediction Model Using Random Forest and Grid Search Cross-Validation for Agriculture Sector,” MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 23, no. 3, pp. 519–530, 2024, doi: 10.30812/matrik.v23i3.3891.

T. Yan, S. L. Shen, A. Zhou, and X. Chen, “Prediction of geological characteristics from shield operational parameters by integrating grid search and K-fold cross validation into stacking classification algorithm,” J. Rock Mech. Geotech. Eng., vol. 14, no. 4, pp. 1292–1303, 2022, doi: 10.1016/j.jrmge.2022.03.002.

“Predict Diabetes From Medical Records.” Accessed: Jun. 14, 2025. [Online]. Available: https://www.kaggle.com/code/paultimothymooney/predict-diabetes-from-medical-records/input

O. A. Montesinos López, A. Montesinos López, and J. Crossa, Multivariate Statistical Machine Learning Methods for Genomic Prediction. 2022. doi: 10.1007/978-3-030-89010-0.

M. Arambakam and J. Beel, “Federated Meta-Learning: Democratizing Algorithm Selection Across Disciplines and Software Libraries,” 7th ICML Work. Autom. Mach. Learn., 2020, [Online]. Available: https://github.com/mukeshmk/scikit-learn

G. N. Ahmad, H. Fatima, Shafiullah, A. Salah Saidi, and Imdadullah, “Efficient Medical Diagnosis of Human Heart Diseases Using Machine Learning Techniques with and Without GridSearchCV,” IEEE Access, vol. 10, no. April, pp. 80151–80173, 2022, doi: 10.1109/ACCESS.2022.3165792.

M. F. El-Amin, B. Alwated, and H. A. Hoteit, “Machine Learning Prediction of Nanoparticle Transport with Two-Phase Flow in Porous Media,” Energies, vol. 16, no. 2, 2023, doi: 10.3390/en16020678.

M. Castejón-Limas, L. Fernández-Robles, H. Alaiz-Moretón, J. Cifuentes-Rodriguez, and C. Fernández-Llamas, “A Framework for the Optimization of Complex Cyber-Physical Systems via Directed Acyclic Graph,” Sensors, vol. 22, no. 4, 2022, doi: 10.3390/s22041490.

F. M. Javed Mehedi Shamrat, P. Ghosh, M. H. Sadek, M. A. Kazi, and S. Shultana, “Implementation of Machine Learning Algorithms to Detect the Prognosis Rate of Kidney Disease,” 2020 IEEE Int. Conf. Innov. Technol. INOCON 2020, no. December, 2020, doi: 10.1109/INOCON50539.2020.9298026.

T. Zhan, “Hyper-Parameter Tuning in Deep Neural Network Learning,” pp. 95–101, 2022, doi: 10.5121/csit.2022.121809.

D. H. Shirazi and H. Toosi, “Deep Multilayer Perceptron Neural Network for the Prediction of Iranian Dam Project Delay Risks,” J. Constr. Eng. Manag., vol. 149, no. 4, p. 04023011, Apr. 2023, doi: 10.1061/JCEMD4.COENG-12367;JOURNAL:JOURNAL:JCEMD4;WGROUP:STRING:PUBLICATION.

S. Afzal, B. M. Ziapour, A. Shokri, H. Shakibi, and B. Sobhani, “Building energy consumption prediction using multilayer perceptron neural network-assisted models; comparison of different optimization algorithms,” Energy, vol. 282, p. 128446, Nov. 2023, doi: 10.1016/J.ENERGY.2023.128446.

S. B. Chavan and S. V. Shinde, “An Experimental Investigation of U-Net-Based Deep Learning Segmentation for Histopathology Images,” 2023 1st DMIHER Int. Conf. Artif. Intell. Educ. Ind. 4.0, IDICAIEI 2023, 2023, doi: 10.1109/IDICAIEI58380.2023.10406634.

X. Wu, Y. Ji, and X. Li, “High-accuracy handwriting recognition based on improved CNN algorithm,” 2021 IEEE 3rd Int. Conf. Commun. Inf. Syst. Comput. Eng. CISCE 2021, pp. 344–348, May 2021, doi: 10.1109/CISCE52179.2021.9445924.

A. Barbu, “Training a Two-Layer ReLU Network Analytically,” Sensors, vol. 23, no. 8, 2023, doi: 10.3390/s23084072.

D. A. Kristiyanti, W. B. N. Pramudya, and S. A. Sanjaya, “How can we predict transportation stock prices using artificial intelligence? Findings from experiments with Long Short-Term Memory based algorithms,” Int. J. Inf. Manag. Data Insights, vol. 4, no. 2, p. 100293, 2024, doi: 10.1016/j.jjimei.2024.100293.

M. Uzair and N. Jamil, “Effects of Hidden Layers on the Efficiency of Neural networks,” Proc. - 2020 23rd IEEE Int. Multi-Topic Conf. INMIC 2020, pp. 1–6, 2020, doi: 10.1109/INMIC50486.2020.9318195.

S. Mastromichalakis, “ALReLU: A different approach on Leaky ReLU activation function to improve Neural Networks Performance,” pp. 1–10, 2020, [Online]. Available: http://arxiv.org/abs/2012.07564

Y. Nawaz, M. U. Hashmi, M. Ali, H. Bibi, M. Ali, and A. Manan, “Performance Evaluation of Various Optimizers for Breast Cancer Diagnosis Using Neural Networks,” vol. 08, no. 01, 2024.

S. A. Abdullah and A. Al-Ashoor, “An artificial deep neural network for the binary classification of network traffic,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 1, pp. 402–408, 2020, doi: 10.14569/ijacsa.2020.0110150.

Y. Bai et al., “Understanding and Improving Early Stopping for Learning with Noisy Labels,” Adv. Neural Inf. Process. Syst., vol. 29, no. NeurIPS, pp. 24392–24403, 2021.

I. Kandel and M. Castelli, “The effect of batch size on the generalizability of the convolutional neural networks on a histopathology dataset,” ICT Express, vol. 6, no. 4, pp. 312–315, 2020, doi: 10.1016/j.icte.2020.04.010.

K. J. Olowe, N. L. Edoh, S. Jean, C. Zouo, J. Olamijuwon, and I. Researcher, “Comprehensive review of logistic regression techniques in predicting health outcomes and trends,” 2024.

L. Pinheiro-Guedes, C. Martinho, and M. R. O Martins, “Logistic Regression: Limitations in the Estimation of Measures of Association with Binary Health Outcomes,” Acta Med. Port., vol. 37, no. 10, pp. 697–705, 2024, doi: 10.20344/amp.21435.

P. Schober and T. R. Vetter, “Statistical Minute,” Int. Anesth. Res. Soc., vol. 129, no. 2, p. 2019, 2019.

Y. Guo, Y. Ge, Y. C. Yang, M. A. Al-Garadi, and A. Sarker, “Comparison of Pretraining Models and Strategies for Health-Related Social Media Text Classification,” Healthc., vol. 10, no. 8, pp. 1–15, 2022, doi: 10.3390/healthcare10081478.

E. Adeli, X. Li, D. Kwon, Y. Zhang, and K. M. Pohl, “Logistic Regression Confined by Cardinality-Constrained Sample and Feature Selection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 7, pp. 1713–1728, Jul. 2020, doi: 10.1109/TPAMI.2019.2901688.

J. Hatwell, M. M. Gaber, and R. M. A. Azad, CHIRPS: Explaining random forest classification, vol. 53, no. 8. Springer Netherlands, 2020. doi: 10.1007/s10462-020-09833-6.

Y. R. Youn and J. Hong, “Optimization of Model based on Relu Activation Function in MLP Neural Network Model,” vol. 13, no. 2, pp. 80–87, 2024.

R. Mondal, T. Pal, and P. Dey, “Discriminative Regularized Input Manifold for multilayer perceptron,” Pattern Recognit., vol. 151, p. 110421, Jul. 2024, doi: 10.1016/J.PATCOG.2024.110421.

X. Hu, L. Chu, J. Pei, W. Liu, and J. Bian, “Model complexity of deep learning: a survey,” Knowl. Inf. Syst., vol. 63, no. 10, pp. 2585–2619, 2021, doi: 10.1007/s10115-021-01605-0.

S. M. Nabavinejad, S. Reda, and M. Ebrahimi, “BatchSizer: Power-Performance Trade-off for DNN Inference,” Proc. Asia South Pacific Des. Autom. Conf. ASP-DAC, pp. 819–824, 2021, doi: 10.1145/3394885.3431535.

M. Reyad, A. M. Sarhan, and M. Arafa, “A modified Adam algorithm for deep neural network optimization,” Neural Comput. Appl., vol. 35, no. 23, pp. 17095–17112, 2023, doi: 10.1007/s00521-023-08568-z.

D. Chicco, M. J. Warrens, and G. Jurman, “The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation,” PeerJ Comput. Sci., vol. 7, pp. 1–24, 2021, doi: 10.7717/PEERJ-CS.623.

T. O. Hodson, “Root-mean-square error (RMSE) or mean absolute error (MAE): when to use them or not,” Geosci. Model Dev., vol. 15, no. 14, pp. 5481–5487, 2022, doi: 10.5194/gmd-15-5481-2022.

X. Wang et al., “Exploratory study on classification of diabetes mellitus through a combined Random Forest Classifier,” BMC Med. Inform. Decis. Mak., vol. 21, no. 1, pp. 1–14, 2021, doi: 10.1186/s12911-021-01471-4.

Additional Files

Published

2025-08-14

How to Cite

Hidayatulloh, R., & Prabowo, W. A. E. (2025). Analisis Performansi Model Machine Learning dalam Klasifikasi Penyakit Diabetes Tipe 2. JURNAL RISET KOMPUTER (JURIKOM), 12(4), 414–421. https://doi.org/10.30865/jurikom.v12i4.8747