Klasifikasi Penyakit Diabetes Pada Imbalanced Class Dataset Menggunakan Algoritme Stacking
DOI:
https://doi.org/10.30865/mib.v6i1.3442Keywords:
Diabetes, Classification, Imbalanced Class, Meta-Learning, Stacking AlgorithmAbstract
Diabetes is a disease that has the potential to cause death. Based on a report from the IDF (International Diabetes Federation), it was stated that in 2019 there were 463 million people in the world suffering from this disease. According to the Ministry of Health, Indonesia is a country that is included in the top 10 highest in the world by the number of people with diabetes. Machine learning models can be a solution for the early detection of diabetes based on history and available data. The majority of the research that has been done chiefly uses a single classifier. The single classifier model has a weakness when faced with class imbalance conditions in the dataset. Therefore, this study uses the Stacking Model for the classification and prediction process on the diabetes dataset. The goal is to improve the performance of a single classifier. In addition, the Stacking Model is expected to be one of the solutions for the classification of diabetes in the imbalanced class dataset. Based on two test experiments that have been carried out using two different datasets. The Stacking algorithm can produce an accuracy value of 89%, TPR value of 89%, TNR value of 85%, and G-Mean of 86.98% in the first dataset and can produce an accuracy value of 96%, TPR value of 96%, TNR value of 94%, and G-Mean of 94.99% in the second dataset. These results are better than single classifiers such as C4.5, K-NN, and SVM of the four indicators evaluated in both diabetes datasets. Thus, the proposed algorithm, namely Stacking (C4.5-SVM), can be a solution for classifying diabetes datasets with unbalanced class distribution conditions.References
Kementrian kesehatan republik indonesia, “Tetap Produktif, Cegah Dan Atasi Diabetes Mellitus,†pusat data dan informasi kementrian kesehatan RI. 2020.
A. Kantono, I. Y. Purbasari, and F. T. Anggraeny, “Penerapan pruning pada algoritma c5.0 untuk mendiagnosis penyakit diabetes melitus 1,†no. September, pp. 184–189, 2019.
Rousiyati, A. Nur Rais, E. Rahmawati, and R. Faizal Amir, “Prediksi Pima Indians Diabetes Database Dengan Ensemble Adaboost Dan Bagging,†J. Sains dan Manaj., vol. 9, no. 2, pp. 36–42, 2021.
M. M. F. Islam, R. Ferdousi, S. Rahman, and H. Y. Bushra, “Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques,†in Computer Vision and Machine Intelligence in Medical Image Analysis, 2020, pp. 113–125.
J. Han, M. Kamber, and J. Pei, Data Mining : Concept and Techniques, Third Edit. Massachusetts: Morgan Kauffman, 2011.
N. Abdulhadi and A. Al-Mousa, “Diabetes Detection Using Machine Learning Classification Methods,†in 2021 International Conference on Information Technology (ICIT), Jul. 2021, pp. 350–354, doi: 10.1109/ICIT52682.2021.9491788.
H. Ahmed, E. M. G. Younis, and A. A. Ali, “Predicting Diabetes using Distributed Machine Learning based on Apache Spark,†in 2020 International Conference on Innovative Trends in Communication and Computer Engineering (ITCE), Feb. 2020, pp. 44–49, doi: 10.1109/ITCE48509.2020.9047795.
G. Tripathi and R. Kumar, “Early Prediction of Diabetes Mellitus Using Machine Learning,†ICRITO 2020 - IEEE 8th Int. Conf. Reliab. Infocom Technol. Optim. (Trends Futur. Dir., pp. 1009–1014, 2020, doi: 10.1109/ICRITO48877.2020.9197832.
V. Lopatka, I. Meniailov, and K. Bazilevych, “Classification and Prediction of Diabetes Disease Using Modified k-neighbors Method,†in 2021 IEEE 12th International Conference on Electronics and Information Technologies (ELIT), May 2021, pp. 46–50, doi: 10.1109/ELIT53502.2021.9501151.
K. Alpan and G. S. İlgi, “Classification of Diabetes Dataset with Data Mining Techniques by Using WEKA Approach,†in 2020 4th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Oct. 2020, pp. 1–7, doi: 10.1109/ISMSIT50672.2020.9254720.
S. K. Reddy, T. Krishnaveni, G. Nikitha, and E. Vijaykanth, “Diabetes Prediction Using Different Machine Learning Algorithms,†in 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA), 2021, pp. 1261–1265, doi: 10.1109/ICIRCA51532.2021.9544593.
M. Yusa, E. Utami, and E. Luthfi. Taufiq, “Evaluasi Performa Algoritma Klasifikasi Decision Tree ID3, C4.5, dan CART Pada Dataset Readmisi Pasien Diabetes,†Infosys (Information Syst. J., vol. 4, no. 1, pp. 23–34, 2016.
D. Vigneswari, N. K. Kumar, V. Ganesh Raj, A. Gugan, and S. R. Vikash, “Machine Learning Tree Classifiers in Predicting Diabetes Mellitus,†2019 5th Int. Conf. Adv. Comput. Commun. Syst. ICACCS 2019, pp. 84–87, 2019, doi: 10.1109/ICACCS.2019.8728388.
Fatmawati, “Perbandingan Algoritma Klasifikasi Data Mining Model C4.5 Dan Naive Bayes Untuk Prediksi Penyakit Diabetes,†J. Techno Nusa Mandiri, vol. 1, no. 3, p. 137, 2016.
D. Sisodia and D. S. Sisodia, “Prediction of Diabetes using Classification Algorithms,†Procedia Comput. Sci., vol. 132, no. Iccids, pp. 1578–1585, 2018, doi: 10.1016/j.procs.2018.05.122.
R. S. Raj, D. S. Sanjay, M. Kusuma, and S. Sampath, “Comparison of Support Vector Machine and Naïve Bayes Classifiers for Predicting Diabetes,†in 2019 1st International Conference on Advanced Technologies in Intelligent Control, Environment, Computing Communication Engineering (ICATIECE), Mar. 2019, pp. 41–45, doi: 10.1109/ICATIECE45860.2019.9063792.
A. Nurmasani and Y. Pristyanto, “ALGORITME STACKING UNTUK KLASIFIKASI PENYAKIT JANTUNG PADA DATASET IMBALANCED CLASS Atik,†J. Pseudocode, vol. VIII, no. Februari, pp. 21–26, 2021.
Peter Turney, “Pima Diabetes Dataset,†National Institute of Diabetes and Digestive and Kidney Diseases, 1990. https://www.kaggle.com/uciml/pima-indians-diabetes-database/.
N. Chanamarn, K. Tamee, and P. Sittidech, “Stacking technique for academic achievement prediction,†Int. Work. Smart Info-Media Syst. Asia (SISA 2016), no. Sisa 2016, pp. 14–17, 2016.
Q. Wang, “A hybrid sampling SVM approach to imbalanced data classification,†Abstr. Appl. Anal., vol. 2014, 2014, doi: 10.1155/2014/972786.
H. R. Sanabila and W. Jatmiko, “Ensemble Learning on Large Scale Financial Imbalanced Data,†2018 Int. Work. Big Data Inf. Secur. IWBIS 2018, pp. 93–98, 2018, doi: 10.1109/IWBIS.2018.8471702.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).