Perbaikan Akurasi Naïve Bayes dengan Chi-Square dan SMOTE Dalam Mengatasi High Dimensional dan Imbalanced Data Banjir

Vito Junivan Rivaldo, Taghfirul Azhima Yoga Siswa, Wawan Joko Pranoto

Abstract


Floods are one of the natural disasters that frequently occur in Indonesia. The city of Samarinda is affected by floods every year, resulting in significant losses. The data used in this study comes from the Regional Disaster Management Agency (BPBD) and the Meteorology, Climatology, and Geophysics Agency (BMKG) for the years 2021-2023 in Samarinda. This data includes 11 attributes and 1095 records. Previous studies on data mining related to floods have been conducted. However, issues arise with high-dimensional data and data imbalance. High dimensionality leads to overfitting and reduced accuracy, while imbalanced data causes overfitting to the majority class and inaccurate representation. This study aims to improve the accuracy of the Naive Bayes algorithm in predicting high-dimensional and imbalanced flood data. The approach involves using the Chi-Square feature selection technique and oversampling with the Synthetic Minority Over-sampling Technique (SMOTE). Chi-Square is used to find optimal features for predicting floods and to enhance the accuracy of the Naive Bayes algorithm in predicting high-dimensional and imbalanced flood data. The validation method used is 10-fold cross-validation, and a confusion matrix model is employed to calculate accuracy values. The results of the study show that Chi-Square can identify four best features: average humidity (rh_avg), rainfall (rr), maximum wind direction (ddd_x), and most frequent wind direction (ddd_car). The use of the Naive Bayes algorithm with SMOTE achieved an accuracy of 71.58%. However, after applying Chi-Square feature selection, the accuracy dropped to 60.82%. This decline is attributed to the reduced number of minority classes after feature selection. Therefore, Chi-Square feature selection is not sufficiently effective in improving the accuracy of Naive Bayes on high-dimensional data.

Keywords


Naïve Bayes; Chi-Square; SMOTE; High Dimensional; Imbalanced Data

Full Text:

PDF

References


Sugiharto S N A, S. Sumaryo, and Kurniawan, “Implementasi pendeteksi dini bahaya banjir,†e-Proceeding Eng., vol. 6, no. 1, pp. 51–58, 2019.

S. Naik, S. A. Patil, A. Verma, and A. Hingmire, “Flood prediction using logistic regression for Kerala state,†Int J Eng Res Technol (IJERT …, vol. 9, no. 3, pp. 2020–2022, 2020, [Online]. Available: https://www.academia.edu/download/66254530/flood_prediction_using_logistic_regression_IJERTCONV9IS03010.pdf

Badan Nasional Penanggulangan Bencana, “Indeks Risiko Bencana Indonesia (IRBI) Tahun 2020,†Badan Nas. Penanggulangan Bencana, p. 78, 2021, [Online]. Available: https://inarisk.bnpb.go.id/pdf/BUKU IRBI 2020 KP.pdf

BPS, “Jumlah Desa_Kelurahan yang Mengalami Bencana Alam_sup_1__sup_ menurut Kecamatan di Kota Samarinda,†Proposal, pp. 4–6, 2024.

N. S. Fatonah, “Penerapan Deteksi Bencana Banjir Menggunakan Metode Machine Learning,†Format J. Ilm. Tek. Inform., vol. 10, no. 2, p. 119, 2021, doi: 10.22441/format.2021.v10.i2.002.

Nicolaus, “JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika) Journal homepage: https://jurnal.stkippgritulungagung.ac.id/index.php/jipi ANALISA PERBANDINGAN ALGORITMA RANDOM FOREST DAN NAÃVE BAYES UNTUK KLASIFIKASI CURAH HUJAN BERDASARKAN IKLIM DI INDONES,†vol. 9, no. 1, pp. 158–167, 2023, [Online]. Available: https://doi.org/10.29100/jipi.v9i1.4421

D. Fitrianah, W. Gunawan, and A. Puspita Sari, “Studi Komparasi Algoritma Klasifikasi C5.0, SVM dan Naive Bayes dengan Studi Kasus Prediksi Banjir Comparative Study of Classification Algorithm between C5.0, SVM and Naive Bayes with Case Study of Flood Prediction,†Februari, vol. 21, no. 1, pp. 1–11, 2022.

R. Y. Kisworini, “Peningkatan Performa Naive Bayes Dengan Seleksi Atribut Menggunakan Chi Square Untuk Klasifikasi Loyalitas Pelanggan GRAB,†J. Informatics, Inf. Syst. Softw. Eng. Appl., vol. 2, no. 2, pp. 69–75, 2020, doi: 10.20895/inista.v2i2.127.

S. Intan and P. Sari, “ANALISIS PENGARUH GAIN RATIO UNTUK ALGORITMA K-NEAREST NEIGHBOR PADA KLASIFIKASI DATA BANJIR DI KOTA SAMARINDA Analysis Of The Effect Of Gain Ratio For Algorithms K-Nearest Neighbor On Classsification Flood Data In Samarinda City,†J. Sains Komput. dan …, vol. 6, no. 1, pp. 54–59, 2023, [Online]. Available: https://journal.umpr.ac.id/index.php/jsakti/article/view/5472%0Ahttps://journal.umpr.ac.id/index.php/jsakti/article/download/5472/3664

H. Ali, M. N. M. Salleh, R. Saedudin, K. Hussain, and M. F. Mushtaq, “Imbalance class problems in data mining: A review,†Indones. J. Electr. Eng. Comput. Sci., vol. 14, no. 3, pp. 1552–1563, 2019, doi: 10.11591/ijeecs.v14.i3.pp1552-1563.

J. M. Johnson and T. M. Khoshgoftaar, “Survey on deep learning with class imbalance,†J. Big Data, vol. 6, no. 1, 2019, doi: 10.1186/s40537-019-0192-5.

M. Cazacu and E. Titan, “Adapting CRISP-DM for Social Sciences,†BRAIN. Broad Res. Artif. Intell. Neurosci., vol. 11, no. 2sup1, pp. 99–106, 2020, doi: 10.18662/brain/11.2sup1/97.

R. R. R. Arisandi, B. Warsito, and A. R. Hakim, “Aplikasi Naïve Bayes Classifier (Nbc) Pada Klasifikasi Status Gizi Balita Stunting Dengan Pengujian K-Fold Cross Validation,†J. Gaussian, vol. 11, no. 1, pp. 130–139, 2022, doi: 10.14710/j.gauss.v11i1.33991.

V. P. K. Turlapati and M. R. Prusty, “Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19,†Intell. Med., vol. 3–4, no. November, p. 100023, 2020, doi: 10.1016/j.ibmed.2020.100023.

M. Afriansyah, J. Saputra, Y. Sa’adati, and Valian Yoga Pudya Ardhana, “Optimasi Algoritma Nai?ve Bayes Untuk Klasifikasi Buah Apel Berdasarkan Fitur Warna RGB,†Bull. Comput. Sci. Res., vol. 3, no. 3, pp. 242–249, 2023, doi: 10.47065/bulletincsr.v3i3.251.

Hairani, N. A. Setiawan, and T. B. Adji, “Metode Klasifikasi Data Mining dan Teknik Sampling Smote ... (Hairani dkk.),†Semin. Nas. Sains dan Teknol., pp. 168–172, 2019.

E. B. Fatima, O. Boutkhoum, E. M. Abdelmajid, F. Rustam, A. Mehmood, and G. S. Choi, “Minimizing the Overlapping Degree to Improve Class-Imbalanced Learning under Sparse Feature Selection: Application to Fraud Detection,†IEEE Access, vol. 9, pp. 28101–28110, 2021, doi: 10.1109/ACCESS.2021.3056285.

A. Nisa, E. Darwiyanto, and I. Asror, “Analisis Sentimen Menggunakan Naive Bayes Classifier dengan Chi-Square Feature Selection Terhadap Penyedia Layanan Telekomunikasi,†e-Proceeding Eng. , vol. 6, no. 2, pp. 8650–8659, 2019.

M. Tsani, A. Rupaka, L. Asmoro, and B. Pradana, “Analisis Sentimen Review Transportasi Menggunakan Algoritma Support Vector Machine Berbasis Chi Square,†Smart Comp Jurnalnya Orang Pint. Komput., vol. 9, no. 1, pp. 35–39, 2020, doi: 10.30591/smartcomp.v9i1.1817.

M. K. Taghfirul Azhima Yoga Siswa, S.Kom., DATA MINING: MENGUPAS TUNTAS ANALISIS DATA DENGAN METODE KLASIFIKASI HINGGA DEPLOYMENT APLIKASI MENGGUNAKAN PYTHON, Edisi Pert. Samarinda: UMKT PRESS, 2023.




DOI: https://doi.org/10.30865/mib.v8i3.7886

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 JURNAL MEDIA INFORMATIKA BUDIDARMA

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.



JURNAL MEDIA INFORMATIKA BUDIDARMA
Universitas Budi Darma
Secretariat: Sisingamangaraja No. 338 Telp 061-7875998
Email: mib.stmikbd@gmail.com

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.