Seleksi Fitur Information Gain untuk Optimasi Klasifikasi Penyakit Tuberkulosis

 (*)Ardi Caesar Kurniawan Mail (Universitas Dian Nuswantoro, Semarang, Indonesia)
 Abu Salam (Universitas Dian Nuswantoro, Semarang, Indonesia)

(*) Corresponding Author

Submitted: December 12, 2023; Published: January 9, 2024

Abstract

Tuberculosis (TB), caused by Mycobacterium tuberculosis, is a global health threat that spreads through the air. Factors such as gender, age, and geographical location influence its spread. Indonesia, the country with the second-highest number of TB cases globally, recorded a significant increase in TB cases from 2020 to 2022, especially in Semarang City. To minimize TBs impact, its crucial to identify the factors influencing its progression. Machine Learning techniques like feature selection (Information Gain) and classification algorithms (Random Forest) can be utilized. Feature selection helps determine which factors most influence TB by ranking attribute weights, while Random Forest is used for classification. Oversampling techniques like Synthetic Minority Oversampling Technique (SMOTE) are used to handle data imbalance and improve classification performance. The study concluded that the Random Forest classification model showed the best performance using all features or attributes from the highest to the lowest weight namely; tipe_diagnosis, jenis_fasyankes, usia, kelurahan_kecamatan, riwayat_dm, riwayat_HIV, tahun, paduan_OAT, status_pekerjaan, jenis_kelamin, tipe_TBC, riwayat_TBC, bulan and sumber_obat on the original TB disease dataset in Semarang City. The recall and accuracy rate reached 75%. This result is better than the TB classification model in Semarang City that uses the oversampling dataset with SMOTE and only uses the top 10-12 attributes, with a recall and accuracy rate of 74%. This research shows that certain techniques in Machine Learning can help understand the factors influencing TB treatment outcomes.

Keywords


Tuberculosis; Machine Learning; Information Gain; Random Forest; Synthetic Minority Oversampling Technique

Full Text:

PDF


Article Metrics

Abstract view : 299 times
PDF - 91 times

References

World Health Organization, Global tuberculosis report 2022, 2022. Accessed: Nov. 06, 2023. [Online]. Available: https://www.who.int/publications/i/item/9789240061729

S. Adhanty and S. Syarif, Kepatuhan Pengobatan pada Pasien Tuberkulosis dan Faktor-Faktor yang Mempengaruhinya: Tinjauan Sistematis, Jurnal Epidemiologi Kesehatan Indonesia, vol. 7, no. 1, p. 7, Jun. 2023, doi: 10.7454/epidkes.v7i1.6571.

S. D. Pralambang and S. Setiawan, Faktor Risiko Kejadian Tuberkulosis di Indonesia, Jurnal Biostatistik, Kependudukan, dan Informatika Kesehatan, vol. 2, no. 1, p. 60, Nov. 2021, doi: 10.51181/bikfokes.v2i1.4660.

M. S. D. Wijaya, M. F. J. Mantik, and N. H. Rampengan, Faktor Risiko Tuberkulosis pada Anak, e-CliniC, vol. 9, no. 1, Jan. 2021, doi: 10.35790/ecl.v9i1.32117.

Laporan Kasus Tuberkulosis (TBC) Global Dan Indonesia 2022, Yayasan KNCV Indonesia. Accessed: Nov. 06, 2023. [Online]. Available: https://yki4tbc.org/laporan-kasus-tbc-global-dan-indonesia-2022/

S. M. E. dan T. Sulistyo, Laporan Program Penanggulangan Tuberkulosis Tahun 2021 Perpustakaan Kemenkes RI, 2022nd ed. Jakarta: Kementerian Kesehatan Republik Indonesia, 2021. Accessed: Nov. 06, 2023. [Online]. Available: https://perpustakaan.kemkes.go.id/books/laporan-program-penanggulangan-tuberkulosis-tahun-2021/

Sp. P. dr. Mochamad Abdul Hakam, M. K. dr. Noegroho Edy Rijanto, S. M. K. M. S. Hanif Pandu Suhito, and S. K. M. Prahita Indriana Raniasmi, PROFIL KESEHATAN KOTA SEMARANG 2021, Semarang, Jun. 2022. [Online]. Available: www.dinkes.semarangkota.go.id

Data_SITB_Bersih. Dinas Kesehatan Kota Semarang, Semarang, Nov. 10, 2023.

T. D. Hartanto, L. D. Saraswati, M. S. Adi, and A. Udiyono, Analisis Spasial Persebaran Kasus Tuberkulosis Paru Di Kota Semarang Tahun 2018, Jurnal Kesehatan Masyarakat, vol. 7, no. 4, pp. 719727, Oct. 2019, doi: 10.14710/JKM.V7I4.25123.

B. Remeseiro and V. Bolon-Canedo, A review of feature selection methods in medical applications, Comput Biol Med, vol. 112, p. 103375, Sep. 2019, doi: 10.1016/j.compbiomed.2019.103375.

B. S. Prakoso, D. Rosiyadi, D. Aridarma, H. S. Utama, F. Fauzi, and M. A. N. Qhomar, OPTIMALISASI KLASIFIKASI BERITA MENGGUNAKAN FEATURE INFORMATION GAIN UNTUK ALGORITMA NAIVE BAYES TERHUBUNG RANDOM FOREST, Jurnal Pilar Nusa Mandiri, vol. 15, no. 2, pp. 211218, Sep. 2019, doi: 10.33480/pilar.v15i2.684.

M. Muqorobin, K. Kusrini, and E. T. Luthfi, OPTIMASI METODE NAIVE BAYES DENGAN FEATURE SELECTION INFORMATION GAIN UNTUK PREDIKSI KETERLAMBATAN PEMBAYARAN SPP SEKOLAH, Jurnal Ilmiah SINUS, vol. 17, no. 1, p. 1, Jan. 2019, doi: 10.30646/sinus.v17i1.378.

A. Kulsumarwati, I. Purnamasari, and B. A. Darmawan, Penerapan SVM dan Information Gain Pada Analisis Sentimen Pelaksanaan Pilkada Saat Pandemi, Jurnal Teknologi Informatika dan Komputer, vol. 7, no. 2, pp. 101109, Sep. 2021, doi: 10.37012/jtik.v7i2.641.

A. Nugroho and E. Rilvani, Penerapan Metode Oversampling SMOTE Pada Algoritma Random Forest Untuk Prediksi Kebangkrutan Perusahaan, Techno.Com, vol. 22, no. 1, pp. 207214, Feb. 2023, doi: 10.33633/tc.v22i1.7527.

N. Hidayati, A. I. Rizmayanti, C. B. S. Dewi, R. Fatmasari, and W. Gata, Penerapan Algoritma Klasterisasi dan Klasifikasi pada Tingkat Kepentingan Sistem Pembelajaran di Universitas Terbuka, Swabumi, vol. 8, no. 2, pp. 134142, Sep. 2020, doi: 10.31294/swabumi.v8i2.8385.

M. Madaan, A. Kumar, C. Keshri, R. Jain, and P. Nagrath, Loan default prediction using decision trees and random forest: A comparative study, IOP Conf Ser Mater Sci Eng, vol. 1022, no. 1, p. 012042, Jan. 2021, doi: 10.1088/1757-899X/1022/1/012042.

S. Diantika, PENERAPAN TEKNIK RANDOM OVERSAMPLING UNTUK MENGATASI IMBALANCE CLASS DALAM KLASIFIKASI WEBSITE PHISHING MENGGUNAKAN ALGORITMA LIGHTGBM, JATI (Jurnal Mahasiswa Teknik Informatika), vol. 7, no. 1, pp. 1925, Jan. 2023, doi: 10.36040/jati.v7i1.6006.

R. Mohammed, J. Rawashdeh, and M. Abdullah, Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results, in 2020 11th International Conference on Information and Communication Systems (ICICS), IEEE, Apr. 2020, pp. 243248. doi: 10.1109/ICICS49469.2020.239556.

D. Elreedy and A. F. Atiya, A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance, Inf Sci (N Y), vol. 505, pp. 3264, Dec. 2019, doi: 10.1016/j.ins.2019.07.070.

S. Maldonado, J. Lpez, and C. Vairetti, An alternative SMOTE oversampling strategy for high-dimensional datasets, Appl Soft Comput, vol. 76, pp. 380389, Mar. 2019, doi: 10.1016/j.asoc.2018.12.024.

M. R. Hasibuan and M. Marji, Pemilihan Fitur dengan Information Gain untuk Klasifikasi Penyakit Gagal Ginjal menggunakan Metode Modified K-Nearest Neighbor (MKNN), Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 3, no. 11, pp. 1043510443, Jan. 2020, [Online]. Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/6691

anikakapoor, ML | Data Preprocessing in Python - GeeksforGeeks, GeeksforGeeks. Accessed: Nov. 10, 2023. [Online]. Available: https://www.geeksforgeeks.org/data-preprocessing-machine-learning-python/

R. Ordila, R. Wahyuni, Y. Irawan, and M. Yulia Sari, PENERAPAN DATA MINING UNTUK PENGELOMPOKAN DATA REKAM MEDIS PASIEN BERDASARKAN JENIS PENYAKIT DENGAN ALGORITMA CLUSTERING (Studi Kasus : Poli Klinik PT.Inecda), Jurnal Ilmu Komputer, vol. 9, no. 2, pp. 148153, Oct. 2020, doi: 10.33060/JIK/2020/Vol9.Iss2.181.

L. Qadrini, H. Hikmah, and M. Megasari, Oversampling, Undersampling, Smote SVM dan Random Forest pada Klasifikasi Penerima Bidikmisi Sejawa Timur Tahun 2017, Journal of Computer System and Informatics (JoSYC), vol. 3, no. 4, pp. 386391, Sep. 2022, doi: 10.47065/josyc.v3i4.2154.

A. Muqtadir and D. S. Purnia, Pemanfaatan Metode SMOTE dan PSO Untuk Mengoptimalkan Tingkat Akurasi Klasifikasi Kepuasan Pelanggan, IJCIT (Indonesian Journal on Computer and Information Technology), vol. 8, no. 1, Aug. 2023, doi: 10.31294/ijcit.v8i1.11657.

F. A. Fauzi, M. T. Furqon, and N. Yudistira, Klasifikasi Jenis Tanaman Tembakau di Indonesia menggunakan Nave Bayes dengan Seleksi Fitur Information Gain, Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 5, no. 2, pp. 698703, Feb. 2021, [Online]. Available: http://j-ptiik.ub.ac.id

S. Zulaikhah Hariyanti Rukmana, A. Aziz, and W. Harianto, OPTIMASI ALGORITMA K-NEAREST NEIGHBOR (KNN) DENGAN NORMALISASI DAN SELEKSI FITUR UNTUK KLASIFIKASI PENYAKIT LIVER, JATI (Jurnal Mahasiswa Teknik Informatika), vol. 6, no. 2, pp. 439445, Aug. 2022, doi: 10.36040/jati.v6i2.4722.

C. M. Ye?ilkanat, Spatio-temporal estimation of the daily cases of COVID-19 in worldwide using random forest machine learning algorithm, Chaos Solitons Fractals, vol. 140, p. 110210, Nov. 2020, doi: 10.1016/j.chaos.2020.110210.

I. Romli and A. T. Zy, Penentuan Jadwal Overtime Dengan Klasifikasi Data Karyawan Menggunakan Algoritma C4.5, Jurnal Sains Komputer & Informatika (J-SAKTI, vol. 4, no. 2, pp. 694702, 2020.

N. Hadianto, H. B. Novitasari, and A. Rahmawati, KLASIFIKASI PEMINJAMAN NASABAH BANK MENGGUNAKAN METODE NEURAL NETWORK, Jurnal Pilar Nusa Mandiri, vol. 15, no. 2, pp. 163170, Sep. 2019, doi: 10.33480/pilar.v15i2.658.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Seleksi Fitur Information Gain untuk Optimasi Klasifikasi Penyakit Tuberkulosis

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 JURNAL MEDIA INFORMATIKA BUDIDARMA

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.



JURNAL MEDIA INFORMATIKA BUDIDARMA
STMIK Budi Darma
Secretariat: Sisingamangaraja No. 338 Telp 061-7875998
Email: mib.stmikbd@gmail.com

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.