Seleksi Fitur Information Gain untuk Optimasi Klasifikasi Penyakit Tuberkulosis

Ardi Caesar Kurniawan; Abu Salam

doi:10.30865/mib.v8i1.7122

Authors

Ardi Caesar Kurniawan Universitas Dian Nuswantoro, Semarang
Abu Salam Universitas Dian Nuswantoro, Semarang

DOI:

https://doi.org/10.30865/mib.v8i1.7122

Keywords:

Tuberculosis, Machine Learning, Information Gain, Random Forest, Synthetic Minority Oversampling Technique

Abstract

Tuberculosis (TB), caused by Mycobacterium tuberculosis, is a global health threat that spreads through the air. Factors such as gender, age, and geographical location influence its spread. Indonesia, the country with the second-highest number of TB cases globally, recorded a significant increase in TB cases from 2020 to 2022, especially in Semarang City. To minimize TB’s impact, it’s crucial to identify the factors influencing its progression. Machine Learning techniques like feature selection (Information Gain) and classification algorithms (Random Forest) can be utilized. Feature selection helps determine which factors most influence TB by ranking attribute weights, while Random Forest is used for classification. Oversampling techniques like Synthetic Minority Oversampling Technique (SMOTE) are used to handle data imbalance and improve classification performance. The study concluded that the Random Forest classification model showed the best performance using all features or attributes from the highest to the lowest weight namely; ‘tipe_diagnosis’, ‘jenis_fasyankes’, ‘usia’, ‘kelurahan_kecamatan’, ‘riwayat_dm’, ‘riwayat_HIV’, ‘tahun’, ‘paduan_OAT’, ‘status_pekerjaan’, ‘jenis_kelamin’, ‘tipe_TBC’, ‘riwayat_TBC’, ‘bulan’ and ‘sumber_obat’ on the original TB disease dataset in Semarang City. The recall and accuracy rate reached 75%. This result is better than the TB classification model in Semarang City that uses the oversampling dataset with SMOTE and only uses the top 10-12 attributes, with a recall and accuracy rate of 74%. This research shows that certain techniques in Machine Learning can help understand the factors influencing TB treatment outcomes.

References

World Health Organization, “Global tuberculosis report 2022,” 2022. Accessed: Nov. 06, 2023. [Online]. Available: https://www.who.int/publications/i/item/9789240061729

S. Adhanty and S. Syarif, “Kepatuhan Pengobatan pada Pasien Tuberkulosis dan Faktor-Faktor yang Mempengaruhinya: Tinjauan Sistematis,” Jurnal Epidemiologi Kesehatan Indonesia, vol. 7, no. 1, p. 7, Jun. 2023, doi: 10.7454/epidkes.v7i1.6571.

S. D. Pralambang and S. Setiawan, “Faktor Risiko Kejadian Tuberkulosis di Indonesia,” Jurnal Biostatistik, Kependudukan, dan Informatika Kesehatan, vol. 2, no. 1, p. 60, Nov. 2021, doi: 10.51181/bikfokes.v2i1.4660.

M. S. D. Wijaya, M. F. J. Mantik, and N. H. Rampengan, “Faktor Risiko Tuberkulosis pada Anak,” e-CliniC, vol. 9, no. 1, Jan. 2021, doi: 10.35790/ecl.v9i1.32117.

“Laporan Kasus Tuberkulosis (TBC) Global Dan Indonesia 2022,” Yayasan KNCV Indonesia. Accessed: Nov. 06, 2023. [Online]. Available: https://yki4tbc.org/laporan-kasus-tbc-global-dan-indonesia-2022/

S. M. E. dan T. Sulistyo, Laporan Program Penanggulangan Tuberkulosis Tahun 2021 – Perpustakaan Kemenkes RI, 2022nd ed. Jakarta: Kementerian Kesehatan Republik Indonesia, 2021. Accessed: Nov. 06, 2023. [Online]. Available: https://perpustakaan.kemkes.go.id/books/laporan-program-penanggulangan-tuberkulosis-tahun-2021/

Sp. P. dr. Mochamad Abdul Hakam, M. K. dr. Noegroho Edy Rijanto, S. M. K. M. S. Hanif Pandu Suhito, and S. K. M. Prahita Indriana Raniasmi, “PROFIL KESEHATAN KOTA SEMARANG 2021,” Semarang, Jun. 2022. [Online]. Available: www.dinkes.semarangkota.go.id

“Data_SITB_Bersih.” Dinas Kesehatan Kota Semarang, Semarang, Nov. 10, 2023.

T. D. Hartanto, L. D. Saraswati, M. S. Adi, and A. Udiyono, “Analisis Spasial Persebaran Kasus Tuberkulosis Paru Di Kota Semarang Tahun 2018,” Jurnal Kesehatan Masyarakat, vol. 7, no. 4, pp. 719–727, Oct. 2019, doi: 10.14710/JKM.V7I4.25123.

B. Remeseiro and V. Bolon-Canedo, “A review of feature selection methods in medical applications,” Comput Biol Med, vol. 112, p. 103375, Sep. 2019, doi: 10.1016/j.compbiomed.2019.103375.

B. S. Prakoso, D. Rosiyadi, D. Aridarma, H. S. Utama, F. Fauzi, and M. A. N. Qhomar, “OPTIMALISASI KLASIFIKASI BERITA MENGGUNAKAN FEATURE INFORMATION GAIN UNTUK ALGORITMA NAIVE BAYES TERHUBUNG RANDOM FOREST,” Jurnal Pilar Nusa Mandiri, vol. 15, no. 2, pp. 211–218, Sep. 2019, doi: 10.33480/pilar.v15i2.684.

M. Muqorobin, K. Kusrini, and E. T. Luthfi, “OPTIMASI METODE NAIVE BAYES DENGAN FEATURE SELECTION INFORMATION GAIN UNTUK PREDIKSI KETERLAMBATAN PEMBAYARAN SPP SEKOLAH,” Jurnal Ilmiah SINUS, vol. 17, no. 1, p. 1, Jan. 2019, doi: 10.30646/sinus.v17i1.378.

A. Kulsumarwati, I. Purnamasari, and B. A. Darmawan, “Penerapan SVM dan Information Gain Pada Analisis Sentimen Pelaksanaan Pilkada Saat Pandemi,” Jurnal Teknologi Informatika dan Komputer, vol. 7, no. 2, pp. 101–109, Sep. 2021, doi: 10.37012/jtik.v7i2.641.

A. Nugroho and E. Rilvani, “Penerapan Metode Oversampling SMOTE Pada Algoritma Random Forest Untuk Prediksi Kebangkrutan Perusahaan,” Techno.Com, vol. 22, no. 1, pp. 207–214, Feb. 2023, doi: 10.33633/tc.v22i1.7527.

N. Hidayati, A. I. Rizmayanti, C. B. S. Dewi, R. Fatmasari, and W. Gata, “Penerapan Algoritma Klasterisasi dan Klasifikasi pada Tingkat Kepentingan Sistem Pembelajaran di Universitas Terbuka,” Swabumi, vol. 8, no. 2, pp. 134–142, Sep. 2020, doi: 10.31294/swabumi.v8i2.8385.

M. Madaan, A. Kumar, C. Keshri, R. Jain, and P. Nagrath, “Loan default prediction using decision trees and random forest: A comparative study,” IOP Conf Ser Mater Sci Eng, vol. 1022, no. 1, p. 012042, Jan. 2021, doi: 10.1088/1757-899X/1022/1/012042.

S. Diantika, “PENERAPAN TEKNIK RANDOM OVERSAMPLING UNTUK MENGATASI IMBALANCE CLASS DALAM KLASIFIKASI WEBSITE PHISHING MENGGUNAKAN ALGORITMA LIGHTGBM,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 7, no. 1, pp. 19–25, Jan. 2023, doi: 10.36040/jati.v7i1.6006.

R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” in 2020 11th International Conference on Information and Communication Systems (ICICS), IEEE, Apr. 2020, pp. 243–248. doi: 10.1109/ICICS49469.2020.239556.

D. Elreedy and A. F. Atiya, “A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance,” Inf Sci (N Y), vol. 505, pp. 32–64, Dec. 2019, doi: 10.1016/j.ins.2019.07.070.

S. Maldonado, J. López, and C. Vairetti, “An alternative SMOTE oversampling strategy for high-dimensional datasets,” Appl Soft Comput, vol. 76, pp. 380–389, Mar. 2019, doi: 10.1016/j.asoc.2018.12.024.

M. R. Hasibuan and M. Marji, “Pemilihan Fitur dengan Information Gain untuk Klasifikasi Penyakit Gagal Ginjal menggunakan Metode Modified K-Nearest Neighbor (MKNN),” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 3, no. 11, pp. 10435–10443, Jan. 2020, [Online]. Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/6691

anikakapoor, “ML | Data Preprocessing in Python - GeeksforGeeks,” GeeksforGeeks. Accessed: Nov. 10, 2023. [Online]. Available: https://www.geeksforgeeks.org/data-preprocessing-machine-learning-python/

R. Ordila, R. Wahyuni, Y. Irawan, and M. Yulia Sari, “PENERAPAN DATA MINING UNTUK PENGELOMPOKAN DATA REKAM MEDIS PASIEN BERDASARKAN JENIS PENYAKIT DENGAN ALGORITMA CLUSTERING (Studi Kasus : Poli Klinik PT.Inecda),” Jurnal Ilmu Komputer, vol. 9, no. 2, pp. 148–153, Oct. 2020, doi: 10.33060/JIK/2020/Vol9.Iss2.181.

L. Qadrini, H. Hikmah, and M. Megasari, “Oversampling, Undersampling, Smote SVM dan Random Forest pada Klasifikasi Penerima Bidikmisi Sejawa Timur Tahun 2017,” Journal of Computer System and Informatics (JoSYC), vol. 3, no. 4, pp. 386–391, Sep. 2022, doi: 10.47065/josyc.v3i4.2154.

A. Muqtadir and D. S. Purnia, “Pemanfaatan Metode SMOTE dan PSO Untuk Mengoptimalkan Tingkat Akurasi Klasifikasi Kepuasan Pelanggan,” IJCIT (Indonesian Journal on Computer and Information Technology), vol. 8, no. 1, Aug. 2023, doi: 10.31294/ijcit.v8i1.11657.

F. A. Fauzi, M. T. Furqon, and N. Yudistira, “Klasifikasi Jenis Tanaman Tembakau di Indonesia menggunakan Naïve Bayes dengan Seleksi Fitur Information Gain,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 5, no. 2, pp. 698–703, Feb. 2021, [Online]. Available: http://j-ptiik.ub.ac.id

S. Zulaikhah Hariyanti Rukmana, A. Aziz, and W. Harianto, “OPTIMASI ALGORITMA K-NEAREST NEIGHBOR (KNN) DENGAN NORMALISASI DAN SELEKSI FITUR UNTUK KLASIFIKASI PENYAKIT LIVER,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 6, no. 2, pp. 439–445, Aug. 2022, doi: 10.36040/jati.v6i2.4722.

C. M. Ye?ilkanat, “Spatio-temporal estimation of the daily cases of COVID-19 in worldwide using random forest machine learning algorithm,” Chaos Solitons Fractals, vol. 140, p. 110210, Nov. 2020, doi: 10.1016/j.chaos.2020.110210.

I. Romli and A. T. Zy, “Penentuan Jadwal Overtime Dengan Klasifikasi Data Karyawan Menggunakan Algoritma C4.5,” Jurnal Sains Komputer & Informatika (J-SAKTI, vol. 4, no. 2, pp. 694–702, 2020.

N. Hadianto, H. B. Novitasari, and A. Rahmawati, “KLASIFIKASI PEMINJAMAN NASABAH BANK MENGGUNAKAN METODE NEURAL NETWORK,” Jurnal Pilar Nusa Mandiri, vol. 15, no. 2, pp. 163–170, Sep. 2019, doi: 10.33480/pilar.v15i2.658.

Seleksi Fitur Information Gain untuk Optimasi Klasifikasi Penyakit Tuberkulosis

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Menu Utama

flagcounter

template

statcounter

rji

terindex