Peningkatan Performa Model Machine Learning XGBoost Classifier melalui Teknik Oversampling dalam Prediksi Penyakit AIDS

Duta Firdaus Wicaksono, Ruri Suko Basuki, Dicky Setiawan

Abstract


The data shows that HIV (Human Immunodeficiency Virus) has caused tens of millions of global deaths, with 630,000 people dying from HIV-related illnesses in 2022 and 1.3 million people newly infected with HIV. Without treatment, HIV can progress to AIDS (Acquired Immune Deficiency Syndrome), weakening the immune system and increasing the risk of infections and other diseases. Despite advancements in treatment, early detection of AIDS remains a priority. This research develops an AIDS prediction model using machine learning, which proves to be an effective solution in providing future health predictions. However, data imbalance issues challenge the model in predicting rare AIDS cases. To solve this problem, oversampling techniques are employed to balance the distribution of minority classes. This study explores oversampling techniques such as SMOTE, ADASYN, and Random Oversampling, combined with the XGBoost algorithm. The results show that the combination of Random Oversampling technique with the XGBoost Classifier yields the best performance with an accuracy of 94.44%, precision of 90.72%, recall of 98.74%, and an f1_score of 94.65%. This research is expected to provide valuable insights for healthcare practitioners and the public in efforts to control the spread of AIDS globally.

Keywords


AIDS; Imbalance Data; Machine Learning; Oversampling; Prediction; XGBoost Classifier

Full Text:

PDF

References


World Health Organization, “HIV and AIDS.†Accessed: Feb. 08, 2024. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/hiv-aids

R. S. Gumarianto, S. Lardo, and A. Chairani, “Hubungan antara Hitung Jumlah CD4 dengan Kejadian Wasting Syndrome pada Pasien HIV/AIDS Di RSPAD Gatot Soebroto Periode Januari-Desember 2020,†J. Kedokt. dan Kesehat. Publ. Ilm. Fak. Kedokt. Univ. Sriwij., vol. 9, no. 2, pp. 133–142, 2022.

D. A. Putri, R. J. Sitorus, and N. Najmah, “Perilaku Berisiko Penularan HIV-AIDS pada Lelaki Seks Lelaki: Studi Literatur,†Heal. Inf. J. Penelit., pp. e1112–e1112, 2023.

G. Ayala and A. Spieldenner, “HIV is a story first written on the bodies of gay and bisexual men,†American Journal of Public Health, vol. 111, no. 7. American Public Health Association, pp. 1240–1242, 2021.

A. Arabia et al., “Evaluasi Sistem Surveillance HIV/AIDS Di Kota Bogor,†Media Kesehat. Politek. Makassar, vol. 18, no. 2, pp. 277–290, 2023.

L. Zhang et al., “A review of machine learning in building load prediction,†Appl. Energy, vol. 285, p. 116452, 2021.

A. Roihan, P. A. Sunarya, and A. S. Rafika, “Pemanfaatan Machine Learning dalam Berbagai Bidang,†J. Khatulistiwa Inform., vol. 5, no. 1, p. 490845, 2020.

R. G. Wardhana, G. Wang, and F. Sibuea, “PENERAPAN MACHINE LEARNING DALAM PREDIKSI TINGKAT KASUS PENYAKIT DI INDONESIA,†J. Inf. Syst. Manag., vol. 5, no. 1, pp. 40–45, 2023.

M. Aqsha and N. Sunusi, “PERFORMA KLASIFIKASI DATA TIDAK SEIMBANG DENGAN PENDEKATAN MACHINE LEARNING (STUDI KASUS: DIABETES INDIAN PIMA),†J. Mat. UNAND, vol. 12, no. 2, pp. 176–193, 2024.

P. R. Sihombing and I. F. Yuliati, “Penerapan Metode Machine Learning dalam Klasifikasi Risiko Kejadian Berat Badan Lahir Rendah di Indonesia,†MATRIK J. Manajemen, Tek. Inform. Dan Rekayasa Komput., vol. 20, no. 2, pp. 417–426, 2021.

N. S. Rahmi, N. W. S. Wardhani, M. B. Mitakda, R. S. Fauztina, and I. Salsabila, “SMOTE Classification and Random Oversampling Naive Bayes in Imbalanced Data : (Case Study of Early Detection of Cervical Cancer in Indonesia),†in 2022 IEEE 7th International Conference on Information Technology and Digital Applications (ICITDA), 2022, pp. 1–6. doi: 10.1109/ICITDA55840.2022.9971421.

C. Haryawan and Y. M. K. Ardhana, “ANALISA PERBANDINGAN TEKNIK OVERSAMPLING SMOTE PADA IMBALANCED DATA,†J. Inform. dan Rekayasa Elektron., vol. 6, no. 1, pp. 73–78, 2023.

J. He et al., “Application of machine learning algorithms in predicting HIV infection among men who have sex with men: Model development and validation,†Front. Public Heal., vol. 10, p. 967681, 2022.

R. Zhou et al., “Prediction Model for Infectious Disease Health Literacy Based on Synthetic Minority Oversampling Technique Algorithm,†Comput. Math. Methods Med., vol. 2022, p. 8498159, 2022, doi: 10.1155/2022/8498159.

R. M. Munshi, “Novel ensemble learning approach with SVM-imputed ADASYN features for enhanced cervical cancer prediction,†PLoS One, vol. 19, no. 1, pp. e0296107-, Jan. 2024, [Online]. Available: https://doi.org/10.1371/journal.pone.0296107

S. U. Nisa, A. Mahmood, F. S. Ujager, and M. Malik, “HIV/AIDS predictive model using random forest based on socio-demographical, biological and behavioral data,†Egypt. Informatics J., vol. 24, no. 1, pp. 107–115, 2023, doi: https://doi.org/10.1016/j.eij.2022.12.005.

L. B. Adzy, A. Pambudi, U. M. Sukabumi, P. Bantuan, I. Jaminan, and S. K. Sukabumi, “Algoritma Naïve Bayes Untuk Klasifikasi Kelayakan Penerima,†vol. 6, no. 1, pp. 1–10, 2023.

S. Khairunnisa, A. Adiwijaya, and S. Al Faraby, “Pengaruh Text Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19),†J. Media Inform. Budidarma, vol. 5, no. 2, p. 406, 2021, doi: 10.30865/mib.v5i2.2835.

R. D. Fitriani, H. Yasin, and T. Tarno, “PENANGANAN KLASIFIKASI KELAS DATA TIDAK SEIMBANG DENGAN RANDOM OVERSAMPLING PADA NAIVE BAYES (Studi Kasus: Status Peserta KB IUD di Kabupaten Kendal),†J. Gaussian, vol. 10, no. 1, pp. 11–20, 2021, doi: 10.14710/j.gauss.v10i1.30243.

P. Y. Saputra, M. Z. Abdullah, and A. P. Kirana, “Improvisasi Teknik Oversampling MWMOTE Untuk Penanganan Data Tidak Seimbang,†J. Media Inform. Budidarma, vol. 5, no. 2, p. 398, 2021, doi: 10.30865/mib.v5i2.2811.

C. Cahyaningtyas, Y. Nataliani, and I. R. Widiasari, “Analisis Sentimen Pada Rating Aplikasi Shopee Menggunakan Metode Decision Tree Berbasis SMOTE,†Aiti, vol. 18, no. 2, pp. 173–184, 2021, doi: 10.24246/aiti.v18i2.173-184.

N. N. Sholihah and A. Hermawan, “Implementation of Random Forest and Smote Methods for Economic Status Classification in Cirebon City,†J. Tek. Inform., vol. 4, no. 6, pp. 1387–1397, 2023.

F. S. Dhitama and F. A. Bachtiar, “Penentuan Kelayakan Debitur Menggunakan Metode Decision Tree C4.5 dan Oversampling Adaptive Synthetic (ADASYN),†J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 4, no. 10, pp. 3712–3721, 2020.

G. Ahmed et al., “DAD-Net: Classification of Alzheimer’s Disease Using ADASYN Oversampling Technique and Optimized Neural Network,†Molecules, vol. 27, no. 20, pp. 1–21, 2022, doi: 10.3390/molecules27207085.

M. Hayaty, S. Muthmainah, and S. M. Ghufran, “Random and Synthetic Over-Sampling Approach to Resolve Data Imbalance in Classification,†Int. J. Artif. Intell. Res., vol. 4, no. 2, p. 86, 2021, doi: 10.29099/ijair.v4i2.152.

Z. E. Aydin and Z. K. Ozturk, “Performance Analysis of XGBoost Classifier with Missing Data,†1st Int. Conf. Comput. Mach. Intell., no. March, 2021, [Online]. Available: https://www.researchgate.net/publication/350135431

Nayan Kumar Sinha, “Developing A Web based System for Breast Cancer Prediction using XGboost Classifier,†Int. J. Eng. Res., vol. V9, no. 06, pp. 852–856, 2020, doi: 10.17577/ijertv9is060612.

S. E. Herni Yulianti, Oni Soesanto, and Yuana Sukmawaty, “Penerapan Metode Extreme Gradient Boosting (XGBOOST) pada Klasifikasi Nasabah Kartu Kredit,†J. Math. Theory Appl., vol. 4, no. 1, pp. 21–26, 2022, doi: 10.31605/jomta.v4i1.1792.

H. Nuraliza, O. N. Pratiwi, and F. Hamami, “Analisis Sentimen IMBd Film Review Dataset Menggunakan Support Vector Machine (SVM) dan Seleksi Feature Importance,†J. Mirai Manaj., vol. 7, no. 1, pp. 1–17, 2022.




DOI: https://doi.org/10.30865/mib.v8i2.7501

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 JURNAL MEDIA INFORMATIKA BUDIDARMA

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.



JURNAL MEDIA INFORMATIKA BUDIDARMA
Universitas Budi Darma
Secretariat: Sisingamangaraja No. 338 Telp 061-7875998
Email: mib.stmikbd@gmail.com

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.