Analisis Metode Ensemble Berbasis Random Forest untuk Klasifikasi Kejadian Stroke pada Dataset Publik
DOI:
https://doi.org/10.30865/json.v7i3.9496Keywords:
Random Forest, Ensemble Learning, Klasifikasi Stroke, SMOTE, Healthcare DatasetAbstract
Stroke merupakan salah satu penyebab utama disabilitas dan kematian global, sehingga diperlukan pendekatan berbasis data untuk mendukung klasifikasi kejadian stroke secara sistematis. Penelitian ini menganalisis variasi metode ensemble berbasis Random Forest pada dataset publik healthcare-dataset-stroke-data dari Kaggle yang terdiri dari 5.110 data pasien dengan 11 variabel demografis dan faktor risiko kardiovaskular. Tahapan prapemrosesan meliputi imputasi nilai hilang pada atribut bmi menggunakan median, penanganan outlier dengan metode interquartile range (IQR), serta penyeimbangan kelas menggunakan SMOTE. Tiga skenario model dikembangkan dalam satu pipeline yang seragam, yaitu Random Forest sebagai baseline, Bagging Random Forest, dan AdaBoost Random Forest. Evaluasi dilakukan menggunakan 5-Fold Cross Validation dengan metrik akurasi, presisi, recall, dan F1-score. Hasil analisis menunjukkan adanya perbedaan nilai metrik evaluasi antar skema ensemble, dengan konfigurasi AdaBoost Random Forest menghasilkan nilai akurasi sebesar 94,70% pada konfigurasi pengujian yang digunakan. Studi ini memfokuskan analisis pada variasi strategi ensemble dalam satu kerangka Random Forest dengan pipeline prapemrosesan yang seragam, sehingga menghasilkan evaluasi yang terkontrol dan reprodusibel.
References
S. S. Martin et al., 2025 Heart Disease and Stroke Statistics: A Report of US and Global Data from the American Heart Association, vol. 151, no. 8. 2025. doi: 10.1161/CIR.0000000000001303.
P. K. Indonesia, Profil Kesehatan. Jalan HR. Rasuna Said Blok X-5 Kav 4-9, Jakarta 12950, 2024. [Online]. Available: http://www.kemkes/go.id
M. P. Deisenroth, A. A. Faisal, and C. S. Ong, Mathematics for Machine Learning, First Edit. Cambridge University Press. doi: 10.1017/9781108679930.
T. Dietterich, C. Bishop, D. Heckerman, M. Jordan, M. Kearns, and A. Editors, Probabilistic machine learning. Cambridge, Massachusetts :, [2022] Series: [Online]. Available: https://lccn.loc.gov/2021027430
P. O. Akinwumi, S. Ojo, T. I. Nathaniel, J. Wanliss, O. Karunwi, and M. Sulaiman, “Evaluating machine learning models for stroke prediction based on clinical variables,” Front. Neurol., vol. 16, no. September, 2025, doi: 10.3389/fneur.2025.1668420.
M. El-Geneedy, H. El-Din Moustafa, H. Khater, S. Abd-Elsamee, and S. A. Gamel, “A comprehensive explainable AI approach for enhancing transparency and interpretability in stroke prediction,” Sci. Rep., vol. 15, no. 1, pp. 1–23, 2025, doi: 10.1038/s41598-025-11263-9.
B. Imran, E. Wahyudi, A. Subki, S. Salman, and A. Yani, “Classification of stroke patients using data mining with adaboost, decision tree and random forest models,” Ilk. J. Ilm., vol. 14, no. 3, pp. 218–228, 2022, doi: 10.33096/ilkom.v14i3.1328.218-228.
T. Fulazzaky, A. Saefuddin, and A. M. Soleh, “Evaluating Ensemble Learning Techniques for Class Imbalance in Machine Learning: A Comparative Analysis of Balanced Random Forest, SMOTE-RF, SMOTEBoost, and RUSBoost,” Sci. J. Informatics, vol. 11, no. 4, pp. 969–980, 2024, doi: 10.15294/sji.v11i4.15937.
M. O. Ullah, S. A. Raju, M. I. Nazir, A. Akter, and M. S. Rahman, “An Innovative Machine Learning Pipeline for Stroke Prediction on Imbalanced Data,” 2023 Int. Conf. Inf. Commun. Technol. Sustain. Dev. ICICT4SD 2023 - Proc., pp. 153–157, 2023, doi: 10.1109/ICICT4SD59951.2023.10303319.
D. Ushasree, A. V. Praveen Krishna, and C. Mallikarjuna Rao, “Enhanced stroke prediction using stacking methodology (ESPESM) in intelligent sensors for aiding preemptive clinical diagnosis of brain stroke,” Meas. Sensors, vol. 33, no. November 2023, p. 101108, 2024, doi: 10.1016/j.measen.2024.101108.
R. A. Astari, I. M. Sumertajaya, and A. M. Soleh, “A Hybrid Sampling Approach for Handling Data Imbalance in Ensemble Learning Algorithms,” Sci. J. Informatics, vol. 12, no. 2, pp. 247–258, 2025, doi: 10.15294/sji.v12i2.19163.
S. Sahriar et al., “Unlocking stroke prediction: Harnessing projection-based statistical feature extraction with ML algorithms,” Heliyon, vol. 10, no. 5, p. e27411, 2024, doi: 10.1016/j.heliyon.2024.e27411.
K. Kitova, I. Ivanov, and V. Hooper, “Stroke Dataset Modeling: Comparative Study of Machine Learning Classification Methods,” Algorithms, vol. 17, no. 12, pp. 1–16, 2024, doi: 10.3390/a17120571.
L. R. Sitompul, A. A. Nababan, M. L. Manihuruk, W. A. Ponsen, and S. Supriyandi, “Comparison of Xgboost, Random Forest and Logistic Regression Algorithms in Stroke Disease Classification,” Sinkron, vol. 9, no. 2, pp. 957–968, 2025, doi: 10.33395/sinkron.v9i2.14794.
E. Erlin, Y. Desnelita, N. Nasution, L. Suryati, and F. Zoromi, “Dampak SMOTE terhadap Kinerja Random Forest Classifier berdasarkan Data Tidak seimbang,” MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 21, no. 3, pp. 677–690, 2022, doi: 10.30812/matrik.v21i3.1726.
Y. P. Fachruddin1*, Errissya Rasywir2, “Random Forest Algorithm with Mutual Information Feature Selection,” J. RESTI (Rekayasa Sist. Teknol. Inf.), vol. 8, no. 4, pp. 555–562, 2024, doi: https://doi.org/10.29207/resti.v8i4.5795 1.
E. F. Laili et al., “Komparasi Algoritma Decision Tree Dan Support Vector Machine ( Svm ) Dalam,” J. Sist. Inf. dan Inform., vol. 8, no. 1, pp. 67–76, 2025.
M. Barata, “Improving the Accuracy of C4.5 Algorithm with Chi-Square Method on Pure Tea Classification Using Electronic Nose,” Resti, vol. 1, no. 10, pp. 19–25, 2021.
M. Barata, Dwi Irnawati, Ifnu Wisma Dwi Prastya, and Dwi Issadari Hastuti, “Hydrogen Sulfide Leak Detection Using the C4.5 Algorithm: Optimizing Feature Extraction for Enhanced Accuracy,” PROCEEDING AL GHAZALI Int. Conf., vol. 2, pp. 348–358, 2025, doi: 10.52802/aicp.v1i1.1352.
A. A. Yaqin, M. A. Barata, and N. Mahmudah, “Implementation of the Random Forest Algorithm with Optuna Optimization in Lung Cancer Classification,” Sistemasi, vol. 14, no. 2, p. 561, 2025, doi: 10.32520/stmsi.v14i2.4877.
H. A. Hill et al., “Integrative Prognostic Machine Learning Models in Mantle Cell Lymphoma,” vol. 3, no. August, pp. 1435–1446, 2023, doi: 10.1158/2767-9764.CRC-23-0083.
E. Helmud, E. Helmud, and P. Romadiana, “Classification Comparison Performance of Supervised Machine Learning Random Forest and Decision Tree Algorithms Using Confusion Matrix,” vol. 13, pp. 92–97, 2024.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Jurnal Sistem Komputer dan Informatika (JSON)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).

