Studi Komparatif Model Machine Learning untuk Klasifikasi Penyakit Jantung dengan SMOTE pada Data Imbalanced
DOI:
https://doi.org/10.30865/jurikom.v13i1.9485Keywords:
Random Forest, XGBoost, LightGBM, Logistic Regression, Heart Disease Classification, Imbalanced DataAbstract
This study examines the application of the Synthetic Minority Over-sampling Technique (SMOTE) for heart disease classification using four machine learning algorithms, namely Logistic Regression, Random Forest, LightGBM, and XGBoost, based on the Heart Disease UCI dataset consisting of 920 medical records with 16 clinical features. The original severity labels (0–4) are converted into two classes, namely not sick (0) and sick (1–4), to better align with binary decision-making needs in clinical screening. The experiments are conducted in two scenarios: (1) training models on the original data without handling class imbalance and (2) training models with SMOTE applied only to the training data within a pipeline, accompanied by hyperparameter tuning using k-fold cross-validation. Model performance is evaluated using accuracy, precision, recall, F1-score, AUC-ROC, as well as confusion matrix analysis to examine misclassifications, particularly false negatives in the sick class. In the scenario without SMOTE, the best model, Logistic Regression, achieves an accuracy of 84.78%, recall of 84.31%, F1-score of 86.00%, and AUC-ROC of 91.95%, although the number of false negatives remains relatively high. After applying SMOTE, there is an increase in recall and F1-score for the positive class across all models, with the best performance obtained by Random Forest with SMOTE, which achieves an accuracy of 86.96%, recall of 87.25%, F1-score of 88.12%, and AUC-ROC of 93.34%. These findings indicate that the combination of SMOTE and hyperparameter optimization can produce a more balanced and reliable heart disease classification model that is potentially useful as a clinical decision support system in healthcare services.
References
[1] A. Sepharni, I. E. Hendrawan, C. Rozikin, and S. Karawang, “STRING (Satuan Tulisan Riset dan Inovasi Teknologi).” [Online]. Available: https://www.kaggle.com/fedesoriano/heart-
[2] A. Yogianto, A. Homaidi, and Z. Fatah, “Implementasi Metode K-Nearest Neighbors (KNN) untuk Klasifikasi Penyakit Jantung,” G-Tech: Jurnal Teknologi Terapan, vol. 8, no. 3, pp. 1720–1728, Jul. 2024, doi: 10.33379/gtech.v8i3.4495.
[3] A. Nurmasani and Y. Pristyanto, “ALGORITME STACKING UNTUK KLASIFIKASI PENYAKIT JANTUNG PADA DATASET IMBALANCED CLASS,” 2021. [Online]. Available: www.ejournal.unib.ac.id/index.php/pseudocode
[4] I. M. Agus Oka Gunawan, I. D. A. Indah Saraswati, I. D. G. Riswana Agung, and I. P. Eka Putra, “Klasifikasi Penyakit Jantung Menggunakan Algoritma Decision Tree Series C4.5 Dengan Rapidminer,” Jurnal Teknologi Dan Sistem Informasi Bisnis, vol. 5, no. 2, pp. 73–83, Apr. 2023, doi: 10.47233/jteksis.v5i2.775.
[5] A. Sepharni, I. E. Hendrawan, C. Rozikin, and S. Karawang, “STRING (Satuan Tulisan Riset dan Inovasi Teknologi).” [Online]. Available: https://www.kaggle.com/fedesoriano/heart-
[6] F. Bukhari, S. Nurdiati, M. Khoirun Najib, and R. Nurul Amalia, “Deteksi Penyakit Jantung Menggunakan Metode Klasifikasi Decision Tree dan Regresi Logistik,” 2023.
[7] D. Haganta Depari et al., “Perbandingan Model Decision Tree, Naive Bayes dan Random Forest untuk Prediksi Klasifikasi Penyakit Jantung,” JURNAL INFORMATIK Edisi ke, vol. 18, p. 2022.
[8] F. Handayani et al., “JEPIN (Jurnal Edukasi dan Penelitian Informatika) Komparasi Support Vector Machine, Logistic Regression Dan Artificial Neural Network dalam Prediksi Penyakit Jantung”.
[9] A. M. A. Rahim, I. Y. R. Pratiwi, and M. A. Fikri, “Klasifikasi Penyakit Jantung Menggunakan Metode Synthetic Minority Over-Sampling Technique dan Random Forest Classifier,” Indonesian Journal of Computer Science, vol. 12, no. 5, 2023, doi: 10.33022/ijcs.v12i5.3413.
[10] A. F. N. Masruriyah, H. Y. Novita, C. E. Sukmawati, A. Ramadhan, S. Arif, and B. Dermawan, “Pengukuran Kinerja Model Klasifikasi dengan Data Oversampling pada Algoritma Supervised Learning untuk Penyakit Jantung,” Computer Science (CO-SCIENCE), vol. 4, no. 1, pp. 62–70, 2024, doi: 10.31294/coscience.v4i1.2389.
[11] D. S. Komputer and F. T. Informasi, “Perbandingan Kinerja Algoritma untuk Prediksi Penyakit Jantung dengan Teknik Data Mining,” 2020. [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC
[12] I. Tougui, A. Jilbab, and J. El Mhamdi, “Heart disease classification using data mining tools and machine learning techniques,” Health Technol (Berl), vol. 10, no. 5, pp. 1137–1144, Sep. 2020, doi: 10.1007/s12553-020-00438-1.
[13] Redwan Sony, “https://www.kaggle.com/datasets/redwankarimsony/heart-disease-data.”
[14] “https://www.kaggle.com/datasets/redwankarimsony/heart-disease-data.”
[15] Q. Al-Na’amneh et al., “Autonomous Self-Adaptation in the Cloud: ML-Heal’s Framework for Proactive Fault Detection and Recovery.” [Online]. Available: www.ijacsa.thesai.org
[16] J. Yang and J. Guan, “A Heart Disease Prediction Model Based on Feature Optimization and Smote-Xgboost Algorithm,” Information, vol. 13, no. 10, p. 475, 2022, doi: 10.3390/info13100475.
[17] H. A. Al-Alshaikh et al., “Comprehensive evaluation and performance analysis of machine learning in heart disease prediction,” Sci Rep, vol. 14, no. 1, Dec. 2024, doi: 10.1038/s41598-024-58489-7.
[18] C. M. Bhatt, P. Patel, T. Ghetia, and P. L. Mazzeo, “Effective Heart Disease Prediction Using Machine Learning Techniques,” Algorithms, vol. 16, no. 2, p. 88, 2023, doi: 10.3390/a16020088.
[19] N. Alotaibi and M. Alzahrani, “Comparative Analysis of Machine Learning Algorithms and Data Mining Techniques for Predicting the Existence of Heart Disease.” [Online]. Available: https://www.
[20] G. Husain et al., “SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models,” Algorithms, vol. 18, no. 1, Jan. 2025, doi: 10.3390/a18010037.
[21] Y. A. Sir and A. H. H. Soepranoto, “Pendekatan Resampling Data Untuk Menangani Masalah Ketidakseimbangan Kelas,” Jurnal Komputer dan Informatika, vol. 10, no. 1, pp. 31–38, Mar. 2022, doi: 10.35508/jicon.v10i1.6554.
[22] E. Erlin, Y. Desnelita, N. Nasution, L. Suryati, and F. Zoromi, “Dampak SMOTE terhadap Kinerja Random Forest Classifier berdasarkan Data Tidak seimbang,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 3, pp. 677–690, Jul. 2022, doi: 10.30812/matrik.v21i3.1726.
[23] Y. Han and I. Joe, “Enhancing Machine Learning Models Through PCA, SMOTE-ENN, and Stochastic Weighted Averaging,” Applied Sciences (Switzerland), vol. 14, no. 21, Nov. 2024, doi: 10.3390/app14219772.
[24] M. Ahmed, M. H. Sulaiman, M. M. Hassan, and T. Bhuiyan, “Predicting the Classification of Heart Failure Patients Using Optimized Machine Learning Algorithms,” IEEE Access, vol. 13, pp. 30555–30569, 2025, doi: 10.1109/ACCESS.2025.3541069.
[25] R. Valarmathi and T. Sheela, “Heart Disease Prediction Using Hyper Parameter Optimization (HPO) Tuning,” Biomed Signal Process Control, vol. 70, p. 103033, 2021, doi: 10.1016/j.bspc.2021.103033.
[26] E. R. Susanto and A. Eka Pranajaya, “Optimasi Random Forest untuk Prediksi Penyakit Jantung Menggunakan SMOTEENN dan Grid Search,” Jurnal Pendidikan dan Teknologi Indonesia, vol. 5, no. 7, pp. 1965–1979, Jul. 2025, doi: 10.52436/1.jpti.855.
[27] K. Sumwiza, C. Twizere, G. Rushingabigwi, P. Bakunzibake, and P. Bamurigire, “Enhanced Cardiovascular Disease Prediction Model Using Random Forest Algorithm,” Inform Med Unlocked, vol. 41, p. 101316, 2023, doi: 10.1016/j.imu.2023.101316.
[28] M. Daviran, A. Maghsoudi, R. Ghezelbash, and B. Pradhan, “A new strategy for spatial predictive mapping of mineral prospectivity: Automated hyperparameter tuning of random forest 2 approach 3.”
[29] M. G. El-Shafiey, A. Hagag, E. S. A. El-Dahshan, and M. A. Ismail, “A hybrid GA and PSO optimized approach for heart-disease prediction based on random forest,” Multimed Tools Appl, vol. 81, no. 13, pp. 18155–18179, May 2022, doi: 10.1007/s11042-022-12425-x.
[30] N. Chandrasekhar and S. Peddakrishna, “Enhancing Heart Disease Prediction Accuracy through Machine Learning Techniques and Optimization,” Processes, vol. 11, no. 4, p. 1210, 2023, doi: 10.3390/pr11041210.
[31] T. Gori and A. Hestiningtyas, “Optimasi Pemilihan Fitur untuk Prediksi Penyakit Jantung Menggunakan Algoritma Genetika dan Random Forest,” Indonesian Journal of Computer Science, vol. 13, no. 5, 2024, doi: 10.33022/ijcs.v13i5.4214.
[32] N. H. Alfajr and S. Defiyanti, “Prediksi Penyakit Jantung Menggunakan Metode Random Forest dan Penerapan Principal Component Analysis (PCA),” Jurnal Informatika dan Teknik Elektro Terapan, vol. 12, no. 3S1, 2024, doi: 10.23960/jitet.v12i3S1.5055.
[33] V. dan T. S. dan V. S. P. Ramalingamsakthivelan N. M. K. dan Silambarasan, “Heart Disease Risk Assessment by Using LightGBM Technique,” International Journal for Multidisciplinary Research, vol. 5, no. 2, 2023, [Online]. Available: https://www.ijfmr.com/papers/2023/2/2620.pdf
[34] K. V. Tompra, G. Papageorgiou, and C. Tjortjis, “Strategic Machine Learning Optimization for Cardiovascular Disease Prediction and High-Risk Patient Identification,” Algorithms, vol. 17, no. 5, May 2024, doi: 10.3390/a17050178.
[35] J. Yang Jian dan Guan, “A Heart Disease Prediction Model Based on Feature Optimization and Smote-Xgboost Algorithm,” Information, vol. 13, no. 10, p. 475, 2022, doi: 10.3390/info13100475.



