Evaluasi Komparatif Random Forest, XGBoost, LightGBM, dan K-Nearest Neighbors untuk Prediksi Cuaca di Kota Semarang

Authors

  • Maulana Wahyu Ibrahim Universitas Dian Nuswantoro, Semarang
  • Wahyu Aji Eko Prabowo Universitas Dian Nuswantoro, Semarang

DOI:

https://doi.org/10.30865/jurikom.v13i1.9488

Keywords:

Machine Learning, Imbalanced data, SMOTE, Weather Predictions, Model Performance Evaluation

Abstract

Accurate weather predictions play an important role in assisting strategic decisions in various fields, from agriculture to disaster management. However, there is a fundamental challenge in creating automatic prediction models, namely the nature of meteorological datasets, which are often imbalanced in class distribution. This phenomenon causes conventional machine learning algorithms to favor the dominant class and be less capable of detecting the rare class (rain), as seen in the low sensitivity values. This study aims to overcome this bias problem and improve the accuracy of daily rainfall classification using a comparative approach with four algorithms: Random Forest, K-Nearest Neighbor (KNN), LightGBM, and XGBoost. As the main method to overcome data imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied to generate new samples in the underrepresented class. Model performance was evaluated comprehensively using a confusion matrix, One-vs-Rest (OvR) strategy, and conventional evaluation metrics. The results of the experiments on the baseline model showed a failure to detect the minority class with very low Recall and F1-Score values (< 0.30). The application of SMOTE was proven to significantly improve Recall and F1-Score compared to the SMOTE. LightGBM using SMOTE was recorded as the most superior model that successfully balanced all evaluation metrics.

References

[1] A. Syahreza, N. K. Ningrum, and M. A. Syahrazy, “Perbandingan Kinerja Model Prediksi Cuaca: Random Forest, Support Vector Regression, dan XGBoost,” Edumatic: Jurnal Pendidikan Informatika, vol. 8, no. 2, pp. 526–534, Dec. 2024, doi: 10.29408/edumatic.v8i2.27640.

[2] Z. A. Dwiyanti and C. Prianto, “Prediksi Cuaca Kota Jakarta Menggunakan Metode Random Forest,” Jurnal Tekno Insentif, vol. 17, no. 2, pp. 127–137, Oct. 2023, doi: 10.36787/jti.v17i2.1136.

[3] A. F. D. Putra, M. N. Azmi, H. Wijayanto, S. Utama, and I. G. P. W. Wedashwara Wirawan, “Optimizing Rain Prediction Model Using Random Forest and Grid Search Cross-Validation for Agriculture Sector,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 23, no. 3, pp. 519–530, Jul. 2024, doi: 10.30812/matrik.v23i3.3891.

[4] A. Elsa Damayanti, E. Andriani, and M. Giyanfranco Zola, “Implementasi Machine Learning Untuk Prediksi Curah Hujan di Daerah Rawan Banjir,” 2025.

[5] Z. Ben Bouallègue et al., “The Rise of Data-Driven Weather Forecasting A First Statistical Assessment of Machine Learning–Based Weather Forecasts in an Operational-Like Context,” Bull Am Meteorol Soc, vol. 105, no. 6, pp. E864–E883, Jun. 2024, doi: 10.1175/BAMS-D-23-0162.1.

[6] H. Zhang, Y. Liu, C. Zhang, and N. Li, “Machine Learning Methods for Weather Forecasting: A Survey,” Jan. 01, 2025, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/atmos16010082.

[7] I. Hapsari and S. Pandya Wisesa, “Evaluasi Model Prediksi Curah Hujan Berbasis Machine Learning di Kota Bandung,” Jurnal Nasional Teknologi dan Sistem Informasi, vol. 11, no. 2, pp. 136–143, Sep. 2025, doi: 10.25077/teknosi.v11i2.2025.136-143.

[8] A. Elsa Damayanti, E. Andriani, and M. Giyanfranco Zola, “Implementasi Machine Learning Untuk Prediksi Curah Hujan di Daerah Rawan Banjir,” 2025.

[9] Z. A. Dwiyanti and C. Prianto, “Prediksi Cuaca Kota Jakarta Menggunakan Metode Random Forest,” Jurnal Tekno Insentif, vol. 17, no. 2, pp. 127–137, Oct. 2023, doi: 10.36787/jti.v17i2.1136.

[10] N. Cahyani, W. A. Putri, and R. Irsyada, “Improving Multiclass Rainfall Prediction with Multilayer Perceptron and SMOTE: Addressing Class Imbalance Challenges,” Brilliance: Research of Artificial Intelligence, vol. 4, no. 2, pp. 901–908, Jan. 2025, doi: 10.47709/brilliance.v4i2.5203.

[11] C. Nur Azzahra et al., “Sistemasi: Jurnal Sistem Informasi Klasifikasi Cuaca Jawa Barat menggunakan Ensemble Learning pada Data Meteorologi Weather Classification in West Java using Ensemble Learning on Meteorological Data,” 2025. [Online]. Available: http://sistemasi.ftik.unisi.ac.id

[12] C. G. L. Pringandana and K. Kusnawi, “A Comparative Analysis of Hyperparameter-Tuned XGBoost and LightGBM for Multiclass Rainfall Classification in Jakarta,” Jurnal Teknik Informatika (Jutif), vol. 6, no. 4, pp. 2467–2483, Aug. 2025, doi: 10.52436/1.jutif.2025.6.4.4965.

[13] “Data Online - Direktorat Data dan Komputasi BMKG.” Accessed: Dec. 19, 2025. [Online]. Available: https://dataonline.bmkg.go.id/dataonline-home

[14] untuk Pemodelan Prediktif Jumlah Peserta Ajar Mata Kuliah Anggi Perwitasari, R. Septiriana, J. Hadari Nawawi, and K. Barat, “JEPIN (Jurnal Edukasi dan Penelitian Informatika),” 2023.

[15] R. Arisandi, D. Ruhiat, and D. E. Marlina, “Implementasi Ridge Regression untuk Mengatasi Gejala Multikolinearitas pada Pemodelan Curah Hujan Berbasis Data Time Series Klimatologi,” Jurnal Riset Matematika dan Sains Terapan, vol. 1, no. 1, pp. 1–11, 2021.

[16] R. C. Chen, C. Dewi, S. W. Huang, and R. E. Caraka, “Selecting critical features for data classification based on machine learning methods,” J Big Data, vol. 7, no. 1, Dec. 2020, doi: 10.1186/s40537-020-00327-4.

[17] M. Yusuf et al., “PENERAPAN ALGORITMA K-NEAREST NEIGHBOR (KNN) DALAM MEMPREDIKSI DAN MENGHITUNG TINGKAT AKURASI DATA CUACA DI INDONESIA,” vol. 2, no. 2, 2021.

[18] J. Pebralia, “JIFP (Jurnal Ilmu Fisika dan Pembelajarannya) Analisis Curah Hujan Menggunakan Machine Learning Metode Regresi Linier Berganda Berbasis Python dan Jupyter Notebook Rainfall Analysis using Machine Learning-Multiple Linear Regression Method Based on Python and Jupyter Notebook,” vol. 6, no. 2, pp. 23–30, 2022, [Online]. Available: http://jurnal.radenfatah.ac.id/index.php/jifp/

[19] Y. Hasnataeni, A. Saefuddin, and A. M. Soleh, “Comparison of Ensemble Forest-Based Methods Performance for Imbalanced Data Classification,” Scientific Journal of Informatics, vol. 12, no. 2, pp. 183–198, Jun. 2025, doi: 10.15294/sji.v12i2.24269.

[20] C. Nur Azzahra et al., “Sistemasi: Jurnal Sistem Informasi Klasifikasi Cuaca Jawa Barat menggunakan Ensemble Learning pada Data Meteorologi Weather Classification in West Java using Ensemble Learning on Meteorological Data,” 2025. [Online]. Available: http://sistemasi.ftik.unisi.ac.id

[21] Y. M. Indah, R. Aristawidya, A. Fitrianto, E. Erfiani, and L. M. R. D. Jumansyah, “Comparison of Random Forest, XGBoost, and LightGBM Methods for the Human Development Index Classification,” Jambura Journal of Mathematics, vol. 7, no. 1, pp. 14–18, Jan. 2025, doi: 10.37905/jjom.v7i1.28290.

[22] V. R. Danestiara, “Algoritma k-Nearest Neighbor Classifier untuk Prediksi Curah Hujan di Kabupaten Bandung,” 2023.

Additional Files

Published

2026-02-28

How to Cite

Maulana Wahyu Ibrahim, & Prabowo, W. A. E. (2026). Evaluasi Komparatif Random Forest, XGBoost, LightGBM, dan K-Nearest Neighbors untuk Prediksi Cuaca di Kota Semarang. JURNAL RISET KOMPUTER (JURIKOM), 13(1), 347–357. https://doi.org/10.30865/jurikom.v13i1.9488