Seleksi Fitur Information Gain untuk Optimasi Klasifikasi Penyakit Tuberkulosis
DOI:
https://doi.org/10.30865/mib.v8i1.7122Keywords:
Tuberculosis, Machine Learning, Information Gain, Random Forest, Synthetic Minority Oversampling TechniqueAbstract
Tuberculosis (TB), caused by Mycobacterium tuberculosis, is a global health threat that spreads through the air. Factors such as gender, age, and geographical location influence its spread. Indonesia, the country with the second-highest number of TB cases globally, recorded a significant increase in TB cases from 2020 to 2022, especially in Semarang City. To minimize TB’s impact, it’s crucial to identify the factors influencing its progression. Machine Learning techniques like feature selection (Information Gain) and classification algorithms (Random Forest) can be utilized. Feature selection helps determine which factors most influence TB by ranking attribute weights, while Random Forest is used for classification. Oversampling techniques like Synthetic Minority Oversampling Technique (SMOTE) are used to handle data imbalance and improve classification performance. The study concluded that the Random Forest classification model showed the best performance using all features or attributes from the highest to the lowest weight namely; ‘tipe_diagnosis’, ‘jenis_fasyankes’, ‘usia’, ‘kelurahan_kecamatan’, ‘riwayat_dm’, ‘riwayat_HIV’, ‘tahun’, ‘paduan_OAT’, ‘status_pekerjaan’, ‘jenis_kelamin’, ‘tipe_TBC’, ‘riwayat_TBC’, ‘bulan’ and ‘sumber_obat’ on the original TB disease dataset in Semarang City. The recall and accuracy rate reached 75%. This result is better than the TB classification model in Semarang City that uses the oversampling dataset with SMOTE and only uses the top 10-12 attributes, with a recall and accuracy rate of 74%. This research shows that certain techniques in Machine Learning can help understand the factors influencing TB treatment outcomes.References
World Health Organization, “Global tuberculosis report 2022,” 2022. Accessed: Nov. 06, 2023. [Online]. Available: https://www.who.int/publications/i/item/9789240061729
S. Adhanty and S. Syarif, “Kepatuhan Pengobatan pada Pasien Tuberkulosis dan Faktor-Faktor yang Mempengaruhinya: Tinjauan Sistematis,” Jurnal Epidemiologi Kesehatan Indonesia, vol. 7, no. 1, p. 7, Jun. 2023, doi: 10.7454/epidkes.v7i1.6571.
S. D. Pralambang and S. Setiawan, “Faktor Risiko Kejadian Tuberkulosis di Indonesia,” Jurnal Biostatistik, Kependudukan, dan Informatika Kesehatan, vol. 2, no. 1, p. 60, Nov. 2021, doi: 10.51181/bikfokes.v2i1.4660.
M. S. D. Wijaya, M. F. J. Mantik, and N. H. Rampengan, “Faktor Risiko Tuberkulosis pada Anak,” e-CliniC, vol. 9, no. 1, Jan. 2021, doi: 10.35790/ecl.v9i1.32117.
“Laporan Kasus Tuberkulosis (TBC) Global Dan Indonesia 2022,” Yayasan KNCV Indonesia. Accessed: Nov. 06, 2023. [Online]. Available: https://yki4tbc.org/laporan-kasus-tbc-global-dan-indonesia-2022/
S. M. E. dan T. Sulistyo, Laporan Program Penanggulangan Tuberkulosis Tahun 2021 – Perpustakaan Kemenkes RI, 2022nd ed. Jakarta: Kementerian Kesehatan Republik Indonesia, 2021. Accessed: Nov. 06, 2023. [Online]. Available: https://perpustakaan.kemkes.go.id/books/laporan-program-penanggulangan-tuberkulosis-tahun-2021/
Sp. P. dr. Mochamad Abdul Hakam, M. K. dr. Noegroho Edy Rijanto, S. M. K. M. S. Hanif Pandu Suhito, and S. K. M. Prahita Indriana Raniasmi, “PROFIL KESEHATAN KOTA SEMARANG 2021,” Semarang, Jun. 2022. [Online]. Available: www.dinkes.semarangkota.go.id
“Data_SITB_Bersih.” Dinas Kesehatan Kota Semarang, Semarang, Nov. 10, 2023.
T. D. Hartanto, L. D. Saraswati, M. S. Adi, and A. Udiyono, “Analisis Spasial Persebaran Kasus Tuberkulosis Paru Di Kota Semarang Tahun 2018,” Jurnal Kesehatan Masyarakat, vol. 7, no. 4, pp. 719–727, Oct. 2019, doi: 10.14710/JKM.V7I4.25123.
B. Remeseiro and V. Bolon-Canedo, “A review of feature selection methods in medical applications,” Comput Biol Med, vol. 112, p. 103375, Sep. 2019, doi: 10.1016/j.compbiomed.2019.103375.
B. S. Prakoso, D. Rosiyadi, D. Aridarma, H. S. Utama, F. Fauzi, and M. A. N. Qhomar, “OPTIMALISASI KLASIFIKASI BERITA MENGGUNAKAN FEATURE INFORMATION GAIN UNTUK ALGORITMA NAIVE BAYES TERHUBUNG RANDOM FOREST,” Jurnal Pilar Nusa Mandiri, vol. 15, no. 2, pp. 211–218, Sep. 2019, doi: 10.33480/pilar.v15i2.684.
M. Muqorobin, K. Kusrini, and E. T. Luthfi, “OPTIMASI METODE NAIVE BAYES DENGAN FEATURE SELECTION INFORMATION GAIN UNTUK PREDIKSI KETERLAMBATAN PEMBAYARAN SPP SEKOLAH,” Jurnal Ilmiah SINUS, vol. 17, no. 1, p. 1, Jan. 2019, doi: 10.30646/sinus.v17i1.378.
A. Kulsumarwati, I. Purnamasari, and B. A. Darmawan, “Penerapan SVM dan Information Gain Pada Analisis Sentimen Pelaksanaan Pilkada Saat Pandemi,” Jurnal Teknologi Informatika dan Komputer, vol. 7, no. 2, pp. 101–109, Sep. 2021, doi: 10.37012/jtik.v7i2.641.
A. Nugroho and E. Rilvani, “Penerapan Metode Oversampling SMOTE Pada Algoritma Random Forest Untuk Prediksi Kebangkrutan Perusahaan,” Techno.Com, vol. 22, no. 1, pp. 207–214, Feb. 2023, doi: 10.33633/tc.v22i1.7527.
N. Hidayati, A. I. Rizmayanti, C. B. S. Dewi, R. Fatmasari, and W. Gata, “Penerapan Algoritma Klasterisasi dan Klasifikasi pada Tingkat Kepentingan Sistem Pembelajaran di Universitas Terbuka,” Swabumi, vol. 8, no. 2, pp. 134–142, Sep. 2020, doi: 10.31294/swabumi.v8i2.8385.
M. Madaan, A. Kumar, C. Keshri, R. Jain, and P. Nagrath, “Loan default prediction using decision trees and random forest: A comparative study,” IOP Conf Ser Mater Sci Eng, vol. 1022, no. 1, p. 012042, Jan. 2021, doi: 10.1088/1757-899X/1022/1/012042.
S. Diantika, “PENERAPAN TEKNIK RANDOM OVERSAMPLING UNTUK MENGATASI IMBALANCE CLASS DALAM KLASIFIKASI WEBSITE PHISHING MENGGUNAKAN ALGORITMA LIGHTGBM,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 7, no. 1, pp. 19–25, Jan. 2023, doi: 10.36040/jati.v7i1.6006.
R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” in 2020 11th International Conference on Information and Communication Systems (ICICS), IEEE, Apr. 2020, pp. 243–248. doi: 10.1109/ICICS49469.2020.239556.
D. Elreedy and A. F. Atiya, “A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance,” Inf Sci (N Y), vol. 505, pp. 32–64, Dec. 2019, doi: 10.1016/j.ins.2019.07.070.
S. Maldonado, J. López, and C. Vairetti, “An alternative SMOTE oversampling strategy for high-dimensional datasets,” Appl Soft Comput, vol. 76, pp. 380–389, Mar. 2019, doi: 10.1016/j.asoc.2018.12.024.
M. R. Hasibuan and M. Marji, “Pemilihan Fitur dengan Information Gain untuk Klasifikasi Penyakit Gagal Ginjal menggunakan Metode Modified K-Nearest Neighbor (MKNN),” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 3, no. 11, pp. 10435–10443, Jan. 2020, [Online]. Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/6691
anikakapoor, “ML | Data Preprocessing in Python - GeeksforGeeks,” GeeksforGeeks. Accessed: Nov. 10, 2023. [Online]. Available: https://www.geeksforgeeks.org/data-preprocessing-machine-learning-python/
R. Ordila, R. Wahyuni, Y. Irawan, and M. Yulia Sari, “PENERAPAN DATA MINING UNTUK PENGELOMPOKAN DATA REKAM MEDIS PASIEN BERDASARKAN JENIS PENYAKIT DENGAN ALGORITMA CLUSTERING (Studi Kasus : Poli Klinik PT.Inecda),” Jurnal Ilmu Komputer, vol. 9, no. 2, pp. 148–153, Oct. 2020, doi: 10.33060/JIK/2020/Vol9.Iss2.181.
L. Qadrini, H. Hikmah, and M. Megasari, “Oversampling, Undersampling, Smote SVM dan Random Forest pada Klasifikasi Penerima Bidikmisi Sejawa Timur Tahun 2017,” Journal of Computer System and Informatics (JoSYC), vol. 3, no. 4, pp. 386–391, Sep. 2022, doi: 10.47065/josyc.v3i4.2154.
A. Muqtadir and D. S. Purnia, “Pemanfaatan Metode SMOTE dan PSO Untuk Mengoptimalkan Tingkat Akurasi Klasifikasi Kepuasan Pelanggan,” IJCIT (Indonesian Journal on Computer and Information Technology), vol. 8, no. 1, Aug. 2023, doi: 10.31294/ijcit.v8i1.11657.
F. A. Fauzi, M. T. Furqon, and N. Yudistira, “Klasifikasi Jenis Tanaman Tembakau di Indonesia menggunakan Naïve Bayes dengan Seleksi Fitur Information Gain,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 5, no. 2, pp. 698–703, Feb. 2021, [Online]. Available: http://j-ptiik.ub.ac.id
S. Zulaikhah Hariyanti Rukmana, A. Aziz, and W. Harianto, “OPTIMASI ALGORITMA K-NEAREST NEIGHBOR (KNN) DENGAN NORMALISASI DAN SELEKSI FITUR UNTUK KLASIFIKASI PENYAKIT LIVER,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 6, no. 2, pp. 439–445, Aug. 2022, doi: 10.36040/jati.v6i2.4722.
C. M. Ye?ilkanat, “Spatio-temporal estimation of the daily cases of COVID-19 in worldwide using random forest machine learning algorithm,” Chaos Solitons Fractals, vol. 140, p. 110210, Nov. 2020, doi: 10.1016/j.chaos.2020.110210.
I. Romli and A. T. Zy, “Penentuan Jadwal Overtime Dengan Klasifikasi Data Karyawan Menggunakan Algoritma C4.5,” Jurnal Sains Komputer & Informatika (J-SAKTI, vol. 4, no. 2, pp. 694–702, 2020.
N. Hadianto, H. B. Novitasari, and A. Rahmawati, “KLASIFIKASI PEMINJAMAN NASABAH BANK MENGGUNAKAN METODE NEURAL NETWORK,” Jurnal Pilar Nusa Mandiri, vol. 15, no. 2, pp. 163–170, Sep. 2019, doi: 10.33480/pilar.v15i2.658.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).