Klasifikasi Penyakit Diabetes Pada Imbalanced Class Dataset Menggunakan Algoritme Stacking

Authors

  • Yoga Pristyanto Universitas Amikom Yogyakarta, Yogyakarta http://orcid.org/0000-0002-1358-4864
  • Acihmah Sidauruk Universitas Amikom Yogyakarta, Yogyakarta
  • Atik Nurmasani Universitas Amikom Yogyakarta, Yogyakarta

DOI:

https://doi.org/10.30865/mib.v6i1.3442

Keywords:

Diabetes, Classification, Imbalanced Class, Meta-Learning, Stacking Algorithm

Abstract

Diabetes is a disease that has the potential to cause death. Based on a report from the IDF (International Diabetes Federation), it was stated that in 2019 there were 463 million people in the world suffering from this disease. According to the Ministry of Health, Indonesia is a country that is included in the top 10 highest in the world by the number of people with diabetes. Machine learning models can be a solution for the early detection of diabetes based on history and available data. The majority of the research that has been done chiefly uses a single classifier. The single classifier model has a weakness when faced with class imbalance conditions in the dataset. Therefore, this study uses the Stacking Model for the classification and prediction process on the diabetes dataset. The goal is to improve the performance of a single classifier. In addition, the Stacking Model is expected to be one of the solutions for the classification of diabetes in the imbalanced class dataset. Based on two test experiments that have been carried out using two different datasets. The Stacking algorithm can produce an accuracy value of 89%, TPR value of 89%, TNR value of 85%, and G-Mean of 86.98% in the first dataset and can produce an accuracy value of 96%, TPR value of 96%, TNR value of 94%, and G-Mean of 94.99% in the second dataset. These results are better than single classifiers such as C4.5, K-NN, and SVM of the four indicators evaluated in both diabetes datasets. Thus, the proposed algorithm, namely Stacking (C4.5-SVM), can be a solution for classifying diabetes datasets with unbalanced class distribution conditions.

References

Kementrian kesehatan republik indonesia, “Tetap Produktif, Cegah Dan Atasi Diabetes Mellitus,†pusat data dan informasi kementrian kesehatan RI. 2020.

A. Kantono, I. Y. Purbasari, and F. T. Anggraeny, “Penerapan pruning pada algoritma c5.0 untuk mendiagnosis penyakit diabetes melitus 1,†no. September, pp. 184–189, 2019.

Rousiyati, A. Nur Rais, E. Rahmawati, and R. Faizal Amir, “Prediksi Pima Indians Diabetes Database Dengan Ensemble Adaboost Dan Bagging,†J. Sains dan Manaj., vol. 9, no. 2, pp. 36–42, 2021.

M. M. F. Islam, R. Ferdousi, S. Rahman, and H. Y. Bushra, “Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques,†in Computer Vision and Machine Intelligence in Medical Image Analysis, 2020, pp. 113–125.

J. Han, M. Kamber, and J. Pei, Data Mining : Concept and Techniques, Third Edit. Massachusetts: Morgan Kauffman, 2011.

N. Abdulhadi and A. Al-Mousa, “Diabetes Detection Using Machine Learning Classification Methods,†in 2021 International Conference on Information Technology (ICIT), Jul. 2021, pp. 350–354, doi: 10.1109/ICIT52682.2021.9491788.

H. Ahmed, E. M. G. Younis, and A. A. Ali, “Predicting Diabetes using Distributed Machine Learning based on Apache Spark,†in 2020 International Conference on Innovative Trends in Communication and Computer Engineering (ITCE), Feb. 2020, pp. 44–49, doi: 10.1109/ITCE48509.2020.9047795.

G. Tripathi and R. Kumar, “Early Prediction of Diabetes Mellitus Using Machine Learning,†ICRITO 2020 - IEEE 8th Int. Conf. Reliab. Infocom Technol. Optim. (Trends Futur. Dir., pp. 1009–1014, 2020, doi: 10.1109/ICRITO48877.2020.9197832.

V. Lopatka, I. Meniailov, and K. Bazilevych, “Classification and Prediction of Diabetes Disease Using Modified k-neighbors Method,†in 2021 IEEE 12th International Conference on Electronics and Information Technologies (ELIT), May 2021, pp. 46–50, doi: 10.1109/ELIT53502.2021.9501151.

K. Alpan and G. S. İlgi, “Classification of Diabetes Dataset with Data Mining Techniques by Using WEKA Approach,†in 2020 4th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Oct. 2020, pp. 1–7, doi: 10.1109/ISMSIT50672.2020.9254720.

S. K. Reddy, T. Krishnaveni, G. Nikitha, and E. Vijaykanth, “Diabetes Prediction Using Different Machine Learning Algorithms,†in 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA), 2021, pp. 1261–1265, doi: 10.1109/ICIRCA51532.2021.9544593.

M. Yusa, E. Utami, and E. Luthfi. Taufiq, “Evaluasi Performa Algoritma Klasifikasi Decision Tree ID3, C4.5, dan CART Pada Dataset Readmisi Pasien Diabetes,†Infosys (Information Syst. J., vol. 4, no. 1, pp. 23–34, 2016.

D. Vigneswari, N. K. Kumar, V. Ganesh Raj, A. Gugan, and S. R. Vikash, “Machine Learning Tree Classifiers in Predicting Diabetes Mellitus,†2019 5th Int. Conf. Adv. Comput. Commun. Syst. ICACCS 2019, pp. 84–87, 2019, doi: 10.1109/ICACCS.2019.8728388.

Fatmawati, “Perbandingan Algoritma Klasifikasi Data Mining Model C4.5 Dan Naive Bayes Untuk Prediksi Penyakit Diabetes,†J. Techno Nusa Mandiri, vol. 1, no. 3, p. 137, 2016.

D. Sisodia and D. S. Sisodia, “Prediction of Diabetes using Classification Algorithms,†Procedia Comput. Sci., vol. 132, no. Iccids, pp. 1578–1585, 2018, doi: 10.1016/j.procs.2018.05.122.

R. S. Raj, D. S. Sanjay, M. Kusuma, and S. Sampath, “Comparison of Support Vector Machine and Naïve Bayes Classifiers for Predicting Diabetes,†in 2019 1st International Conference on Advanced Technologies in Intelligent Control, Environment, Computing Communication Engineering (ICATIECE), Mar. 2019, pp. 41–45, doi: 10.1109/ICATIECE45860.2019.9063792.

A. Nurmasani and Y. Pristyanto, “ALGORITME STACKING UNTUK KLASIFIKASI PENYAKIT JANTUNG PADA DATASET IMBALANCED CLASS Atik,†J. Pseudocode, vol. VIII, no. Februari, pp. 21–26, 2021.

Peter Turney, “Pima Diabetes Dataset,†National Institute of Diabetes and Digestive and Kidney Diseases, 1990. https://www.kaggle.com/uciml/pima-indians-diabetes-database/.

N. Chanamarn, K. Tamee, and P. Sittidech, “Stacking technique for academic achievement prediction,†Int. Work. Smart Info-Media Syst. Asia (SISA 2016), no. Sisa 2016, pp. 14–17, 2016.

Q. Wang, “A hybrid sampling SVM approach to imbalanced data classification,†Abstr. Appl. Anal., vol. 2014, 2014, doi: 10.1155/2014/972786.

H. R. Sanabila and W. Jatmiko, “Ensemble Learning on Large Scale Financial Imbalanced Data,†2018 Int. Work. Big Data Inf. Secur. IWBIS 2018, pp. 93–98, 2018, doi: 10.1109/IWBIS.2018.8471702.

Downloads

Published

2022-01-25