Implementasi XGBoost Pada Keseimbangan Liver Patient Dataset dengan SMOTE dan Hyperparameter Tuning Bayesian Search

Authors

  • Rahmad Ubaidillah Universitas Lambung Mangkurat, Banjarbaru
  • Muliadi Muliadi Universitas Lambung Mangkurat, Banjarbaru
  • Dodon Turianto Nugrahadi Universitas Lambung Mangkurat, Banjarbaru
  • M Reza Faisal Universitas Lambung Mangkurat, Banjarbaru
  • Rudy Herteno Universitas Lambung Mangkurat, Banjarbaru

DOI:

https://doi.org/10.30865/mib.v6i3.4146

Keywords:

Classification, XGBoost, SMOTE, Hyperparameter Tuning, Bayesian Search

Abstract

Liver disease is a disorder of liver function caused by infection with viruses, bacteria or other toxic substances so that the liver cannot function properly. This liver disease needs to be diagnosed early using a classification algorithm. By using the Indian liver patient dataset, predictions can be made using a classification algorithm to determine whether or not patients have liver disease. However, this dataset has a problem where there is an imbalance of data between patients with liver disease and those without, so it can reduce the performance of the prediction model because it tends to produce non-specific predictions. In this study, classification uses the XGBoost method which is then added with SMOTE to overcome class imbalances in the dataset and/or combined with Bayesian search hyperparameter tuning so that the resulting model performance is better. From the research, the results obtained from the XGBoost model get an AUC value of 0.618, for the XGBoost model with Bayesian search the AUC value is 0.658, then for the XGBoost SMOTE model the AUC value is 0.716, then for the XGBoost SMOTE model with Bayesian search the AUC value is 0.767. From the comparison of the four models, XGBoost SMOTE with Bayesian search obtained the highest AUC results and has an AUC difference of 0.149 compared to the XGBoost model without SMOTE and Bayesian search.

References

P. Widodo, “Rule-Based Classifier untuk Mendeteksi Penyakit Liver,†Bianglala Inform., vol. II, no. 1, pp. 71–80, 2014, [Online]. Available: https://ejournal.bsi.ac.id/ejurnal/index.php/Bianglala/article/view/563/455

E. Rahmawati, “UNTUK PREDIKSI PENYAKIT LIVER,†vol. XII, no. 2, 2015, [Online]. Available: http://archive.ics.uci.edu/ml/.

P. Handayani et al., “PREDIKSI PENYAKIT LIVER DENGAN MENGGUNAKAN METODE DECISION TREE DAN NEURAL NETWORK,†2019.

M. Rizkifahdia, “PERBANDINGAN ALGORITMA KLASIFIKASI UNTUK PREDIKSI PENYAKIT LIVER,†J. Rekayasa Perangkat Lunak, vol. vol 1, no. no 2, pp. 82–88, 2020, [Online]. Available: http://jurnal.bsi.ac.id/index.php/reputasi

N. Musyaffa and B. Rifai, “Model Support Vector Machine Berbasis Particle Swarm Optimization Untuk Prediksi Penyakit Liver,†JITK (Jurnal Ilmu Pengetah. Dan Teknol. Komputer), vol. 3, no. 2, pp. 189–194, 2018, doi: https://doi.org/10.33480/jitk.v3i2.

E. Putri and I. Sari, “Perbandingan Metode Multilayer Perceptron (MLP) dan Xtreme Gradient Boosting (XGBoost) pada Data Ekspresi Gen Hepatocelluler Carsinoma Terinfeksi Hepatitis B.†2012. [Online]. Available: http://repository.unimus.ac.id

Moch. Lutfi and Mochamad Hasyim, “Penanganan Data Missing Value Pada Kualitas Produksi Jagung Dengan Menggunakan Metode K-Nn Imputation Pada Algoritma C4.5,†J. Resist. (Rekayasa Sist. Komputer), vol. 2, no. 2, pp. 89–104, 2019, doi: 10.31598/jurnalresistor.v2i2.427.

N. Ghaniaviyanto Ramadhan and A. Khoirunnisa, “JURNAL MEDIA INFORMATIKA BUDIDARMA Klasifikasi Data Malaria Menggunakan Metode Support Vector Machine,†vol. 5, pp. 1580–1584, doi: 10.30865/mib.v5i4.3347.

M. Sulistiyono, Y. Pristyanto, S. Adi, and G. Gumelar, “SISTEMASI: Jurnal Sistem Informasi Implementasi Algoritma Synthetic Minority Over-Sampling Technique untuk Menangani Ketidakseimbangan Kelas pada Dataset Klasifikasiâ€, [Online]. Available: http://sistemasi.ftik.unisi.ac.id

R. Siringoringo, R. Perangin-angin, and J. Jamaluddin, “Model Hibrid Genetic-Xgboost Dan Principal Component Analysis Pada Segmentasi Dan Peramalan Pasar,†METHOMIKA J. Manaj. Inform. dan Komputerisasi Akunt., vol. 5, no. 2, pp. 97–103, 2021, doi: 10.46880/jmika.vol5no2.pp97-103.

Y. Laurensia, J. C. Young, and A. Suryadibrata, “Early Detection of Diabetic Retinopathy Cases using Pre-trained EfficientNet and XGBoost,†Int. J. Adv. Soft Comput. its Appl., vol. 12, no. 3, pp. 101–111, 2020.

U. Pujianto, “Strategi Resampling Berbasis Centroid Untuk Menangani Lunak,†Teknno, vol. 25, no. Maret, pp. 1–6, 2016.

Downloads

Published

2022-07-25

Issue

Section

Articles