Klasifikasi Penyakit Diabetes Pada Imbalanced Class Dataset Menggunakan Algoritme Stacking

Yoga Pristyanto; Acihmah Sidauruk; Atik Nurmasani

doi:10.30865/mib.v6i1.3442

Authors

Yoga Pristyanto Universitas Amikom Yogyakarta, Yogyakarta http://orcid.org/0000-0002-1358-4864
Acihmah Sidauruk Universitas Amikom Yogyakarta, Yogyakarta
Atik Nurmasani Universitas Amikom Yogyakarta, Yogyakarta

DOI:

https://doi.org/10.30865/mib.v6i1.3442

Keywords:

Diabetes, Classification, Imbalanced Class, Meta-Learning, Stacking Algorithm

Abstract

Diabetes is a disease that has the potential to cause death. Based on a report from the IDF (International Diabetes Federation), it was stated that in 2019 there were 463 million people in the world suffering from this disease. According to the Ministry of Health, Indonesia is a country that is included in the top 10 highest in the world by the number of people with diabetes. Machine learning models can be a solution for the early detection of diabetes based on history and available data. The majority of the research that has been done chiefly uses a single classifier. The single classifier model has a weakness when faced with class imbalance conditions in the dataset. Therefore, this study uses the Stacking Model for the classification and prediction process on the diabetes dataset. The goal is to improve the performance of a single classifier. In addition, the Stacking Model is expected to be one of the solutions for the classification of diabetes in the imbalanced class dataset. Based on two test experiments that have been carried out using two different datasets. The Stacking algorithm can produce an accuracy value of 89%, TPR value of 89%, TNR value of 85%, and G-Mean of 86.98% in the first dataset and can produce an accuracy value of 96%, TPR value of 96%, TNR value of 94%, and G-Mean of 94.99% in the second dataset. These results are better than single classifiers such as C4.5, K-NN, and SVM of the four indicators evaluated in both diabetes datasets. Thus, the proposed algorithm, namely Stacking (C4.5-SVM), can be a solution for classifying diabetes datasets with unbalanced class distribution conditions.

References

Kementrian kesehatan republik indonesia, â€œTetap Produktif, Cegah Dan Atasi Diabetes Mellitus,â€ pusat data dan informasi kementrian kesehatan RI. 2020.

A. Kantono, I. Y. Purbasari, and F. T. Anggraeny, â€œPenerapan pruning pada algoritma c5.0 untuk mendiagnosis penyakit diabetes melitus 1,â€ no. September, pp. 184â€“189, 2019.

Rousiyati, A. Nur Rais, E. Rahmawati, and R. Faizal Amir, â€œPrediksi Pima Indians Diabetes Database Dengan Ensemble Adaboost Dan Bagging,â€ J. Sains dan Manaj., vol. 9, no. 2, pp. 36â€“42, 2021.

M. M. F. Islam, R. Ferdousi, S. Rahman, and H. Y. Bushra, â€œLikelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques,â€ in Computer Vision and Machine Intelligence in Medical Image Analysis, 2020, pp. 113â€“125.

J. Han, M. Kamber, and J. Pei, Data Mining : Concept and Techniques, Third Edit. Massachusetts: Morgan Kauffman, 2011.

N. Abdulhadi and A. Al-Mousa, â€œDiabetes Detection Using Machine Learning Classification Methods,â€ in 2021 International Conference on Information Technology (ICIT), Jul. 2021, pp. 350â€“354, doi: 10.1109/ICIT52682.2021.9491788.

H. Ahmed, E. M. G. Younis, and A. A. Ali, â€œPredicting Diabetes using Distributed Machine Learning based on Apache Spark,â€ in 2020 International Conference on Innovative Trends in Communication and Computer Engineering (ITCE), Feb. 2020, pp. 44â€“49, doi: 10.1109/ITCE48509.2020.9047795.

G. Tripathi and R. Kumar, â€œEarly Prediction of Diabetes Mellitus Using Machine Learning,â€ ICRITO 2020 - IEEE 8th Int. Conf. Reliab. Infocom Technol. Optim. (Trends Futur. Dir., pp. 1009â€“1014, 2020, doi: 10.1109/ICRITO48877.2020.9197832.

V. Lopatka, I. Meniailov, and K. Bazilevych, â€œClassification and Prediction of Diabetes Disease Using Modified k-neighbors Method,â€ in 2021 IEEE 12th International Conference on Electronics and Information Technologies (ELIT), May 2021, pp. 46â€“50, doi: 10.1109/ELIT53502.2021.9501151.

K. Alpan and G. S. Ä°lgi, â€œClassification of Diabetes Dataset with Data Mining Techniques by Using WEKA Approach,â€ in 2020 4th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Oct. 2020, pp. 1â€“7, doi: 10.1109/ISMSIT50672.2020.9254720.

S. K. Reddy, T. Krishnaveni, G. Nikitha, and E. Vijaykanth, â€œDiabetes Prediction Using Different Machine Learning Algorithms,â€ in 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA), 2021, pp. 1261â€“1265, doi: 10.1109/ICIRCA51532.2021.9544593.

M. Yusa, E. Utami, and E. Luthfi. Taufiq, â€œEvaluasi Performa Algoritma Klasifikasi Decision Tree ID3, C4.5, dan CART Pada Dataset Readmisi Pasien Diabetes,â€ Infosys (Information Syst. J., vol. 4, no. 1, pp. 23â€“34, 2016.

D. Vigneswari, N. K. Kumar, V. Ganesh Raj, A. Gugan, and S. R. Vikash, â€œMachine Learning Tree Classifiers in Predicting Diabetes Mellitus,â€ 2019 5th Int. Conf. Adv. Comput. Commun. Syst. ICACCS 2019, pp. 84â€“87, 2019, doi: 10.1109/ICACCS.2019.8728388.

Fatmawati, â€œPerbandingan Algoritma Klasifikasi Data Mining Model C4.5 Dan Naive Bayes Untuk Prediksi Penyakit Diabetes,â€ J. Techno Nusa Mandiri, vol. 1, no. 3, p. 137, 2016.

D. Sisodia and D. S. Sisodia, â€œPrediction of Diabetes using Classification Algorithms,â€ Procedia Comput. Sci., vol. 132, no. Iccids, pp. 1578â€“1585, 2018, doi: 10.1016/j.procs.2018.05.122.

R. S. Raj, D. S. Sanjay, M. Kusuma, and S. Sampath, â€œComparison of Support Vector Machine and NaÃ¯ve Bayes Classifiers for Predicting Diabetes,â€ in 2019 1st International Conference on Advanced Technologies in Intelligent Control, Environment, Computing Communication Engineering (ICATIECE), Mar. 2019, pp. 41â€“45, doi: 10.1109/ICATIECE45860.2019.9063792.

A. Nurmasani and Y. Pristyanto, â€œALGORITME STACKING UNTUK KLASIFIKASI PENYAKIT JANTUNG PADA DATASET IMBALANCED CLASS Atik,â€ J. Pseudocode, vol. VIII, no. Februari, pp. 21â€“26, 2021.

Peter Turney, â€œPima Diabetes Dataset,â€ National Institute of Diabetes and Digestive and Kidney Diseases, 1990. https://www.kaggle.com/uciml/pima-indians-diabetes-database/.

N. Chanamarn, K. Tamee, and P. Sittidech, â€œStacking technique for academic achievement prediction,â€ Int. Work. Smart Info-Media Syst. Asia (SISA 2016), no. Sisa 2016, pp. 14â€“17, 2016.

Q. Wang, â€œA hybrid sampling SVM approach to imbalanced data classification,â€ Abstr. Appl. Anal., vol. 2014, 2014, doi: 10.1155/2014/972786.

H. R. Sanabila and W. Jatmiko, â€œEnsemble Learning on Large Scale Financial Imbalanced Data,â€ 2018 Int. Work. Big Data Inf. Secur. IWBIS 2018, pp. 93â€“98, 2018, doi: 10.1109/IWBIS.2018.8471702.

Klasifikasi Penyakit Diabetes Pada Imbalanced Class Dataset Menggunakan Algoritme Stacking

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Menu Utama

flagcounter

template

statcounter

rji

terindex