Meningkatkan Kemampuan Model dalam Memprediksi Penyakit Jantung dengan Algoritma NCL dan GridSearchCV
DOI:
https://doi.org/10.30865/mib.v7i3.6142Keywords:
Heart Disease, GridSearchCV, Weighted Logistic Regression, Neighborhood Cleaning Rule, Machine LearningAbstract
Heart disease is the main cause of death in the world. To reduce this high mortality rate, accurate prediction capabilities are needed in warning people with heart disease to prevent and manage this condition. This study uses a machine learning model to predict heart disease. The purpose of this research is to improve the ability of a machine learning classification model, namely Logistic Regression (LR), in predicting heart disease. So that prediction errors that can harm patients can be significantly reduced. To achieve this goal, research is carried out using two important approaches, namely data preparation and model optimization. At the data preparation stage, data imbalance problems were found between people with heart disease and non-heart disease sufferers. To deal with this problem, the Neighborhood Cleaning Rule (NCL) algorithm is used to correct data imbalances. The use of NCL in the data preparation stage has a significant impact on improving the performance of the prediction model. Furthermore, at the model optimization stage, the GridSearchCV method is used to find the best hyperparameter combination in the Logistic Regression (LR) algorithm. By finding optimal hyperparameters, the performance of the prediction model can be improved. In addition, this study also implemented Weighted Logistic Regression which allows setting class weights, which also contributes to improving model performance. The results of testing the model using the evaluation metrics Accuracy, Recall, and Area Under Curve (AUC) show an increase in the ability of the model. The recall score increased from 0.10 to 0.93, and the AUC score increased from 0.83 to 0.98. This study used a dataset obtained from Kaggle from the Centers for Disease Control and Prevention (CDC). With better predictive ability in identifying heart disease, it is hoped that it can provide accurate early warning to individuals at risk, thereby significantly reducing mortality from heart disease.References
R. Annisa, “ANALISIS KOMPARASI ALGORITMA KLASIFIKASI DATA MINING UNTUK PREDIKSI PENDERITA PENYAKIT JANTUNG,†J. Tek. Inform. Kaputama, vol. 3, pp. 22–28, 2019.
and S. S. M. N, R. A, R. S., “Karakteristik Dan Prevalensi Risiko Penyakit Kardiovaskular Pada Tukang Masak Warung Makan Di Wilayah Kerja Puskesmas Tamalanrea,†J. Kesehat., vol. 11, pp. 30–38, 2018.
A. I. Ayu, E. Widiastuti, R. Cholidah, G. W. Buanayuda and I. B. Alit, “Deteksi Dini Faktor Risiko Penyakit Kardiovaskuler pada Pegawai Rektorat Universitas Mataram,†J. Pengabdi. Magister Pendidik. IPA, vol. 4, pp. 137–142, 2021.
Kemenkes RI, “Cardiovasular Diseases Guideline,†p. 32, 2009.
A. B. Wibisono and A. Fahrurozi, “Perbandingan Algoritma Klasifikasi Dalam Pengklasifikasian Data Penyakit Jantung Koroner,†J. Ilm. Teknol. dan Rekayasa,†vol. 24, pp. 161–170, 2019.
C. Krittanawong et al., “Machine learning prediction in cardiovascular diseases: a meta-analysis,†Sci. Rep., vol. 10, no. 1, pp. 1–11, 2020, doi: 10.1038/s41598-020-72685-1.
I. Fakhruzi, “An artificial neural network with bagging to address imbalance datasets on clinical prediction,†2018 Int. Conf. Inf. Commun. Technol. ICOIACT 2018, vol. 2018-Janua, no. 1, pp. 895–898, 2018, doi: 10.1109/ICOIACT.2018.8350824.
A. Fernández, S. GarcÃa, M. Galar, R. C. Prati, B. Krawczyk, and F. Herrera, Learning From Imbalanced Data Sets. Springer US, 2018. doi: 10.1007/978-3-319-98074-4_5.
S. Uddin, A. Khan, M. E. Hossain, and M. A. Moni, “Comparing different supervised machine learning algorithms for disease prediction,†BMC Med. Inform. Decis. Mak., vol. 19, no. 1, pp. 1–16, 2019, doi: 10.1186/s12911-019-1004-8.
P. D. Putra and D. P. Rini, “Prediksi Penyakit Jantung dengan Algoritma Klasifikasi,†Pros. Annu. Res. Semin. 2019, vol. 5, no. 1, pp. 978–979, 2019.
F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,†J. Mach. Learn. Res., vol. 12, no. 85, pp. 2825–2830, 2011, [Online]. Available: http://jmlr.org/papers/v12/pedregosa11a.html
Z. M. Alhakeem, Y. M. Jebur, S. N. Henedy, H. Imran, L. F. A. Bernardo, and H. M. Hussein, “Prediction of Ecofriendly Concrete Compressive Strength Using Gradient Boosting Regression Tree Combined with GridSearchCV Hyperparameter-Optimization Techniques,†Materials (Basel)., vol. 15, no. 21, p. 7432, 2022, doi: 10.3390/ma15217432.
“Personal Key Indicators of Heart Disease,†2022.
Anikakapoor, “ML | Data Preprocessing in Python,†GeeksforGeeks, 2023.
F. S. Pamungkas, B. D. Prasetya, and I. Kharisudin, “Perbandingan Metode Klasifikasi Supervised Learning pada Data Bank Customers Menggunakan Python,†Prism. Pros. Semin. Nas. Mat., vol. 3, pp. 692–697, 2020.
R. Ordila, R. Wahyuni, Y. Irawan, and M. Yulia Sari, “PENERAPAN DATA MINING UNTUK PENGELOMPOKAN DATA REKAM MEDIS PASIEN BERDASARKAN JENIS PENYAKIT DENGAN ALGORITMA CLUSTERING (Studi Kasus : Poli Klinik PT.Inecda),†J. Ilmu Komput., vol. 9, no. 2, pp. 148–153, 2020, doi: 10.33060/jik/2020/vol9.iss2.181.
C. A. Ramezan, “Transferability of Recursive Feature Elimination (RFE)-Derived Feature Sets for Support Vector Machine Land Cover Classification,†Remote Sens., vol. 14, no. 24, 2022, doi: 10.3390/rs14246218.
S. Farhad Khorshid and A. Mohsin Abdulazeez, “Breast Cancer Diagnosis Based on K-Nearest Neighbors: a Review,†J. Archaeol. Egypt/Egyptology, vol. 18, no. 4, pp. 1927–1951, 2021.
R. D. King, O. I. Orhobor, and C. C. Taylor, “Cross-validation is safe to use,†Nat. Mach. Intell., vol. 3, no. 4, p. 276, 2021, doi: 10.1038/s42256-021-00332-z.
D. P. Utomo and M. Mesran, “Analisis Komparasi Metode Klasifikasi Data Mining dan Reduksi Atribut Pada Data Set Penyakit Jantung,†J. Media Inform. Budidarma, vol. 4, no. 2, p. 437, 2020, doi: 10.30865/mib.v4i2.2080.
T. Gneiting and E. M. Walz, “Receiver operating characteristic (ROC) movies, universal ROC (UROC) curves, and coefficient of predictive ability (CPA),†Mach. Learn., vol. 111, no. 8, pp. 2769–2797, 2022, doi: 10.1007/s10994-021-06114-3.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).