Meningkatkan Kemampuan Model dalam Memprediksi Penyakit Jantung dengan Algoritma NCL dan GridSearchCV

 Zulfan Ahmadi (Universitas Muhammadiyah Pontianak, Pontianak, Indonesia)
 (*)Asrul Abdullah Mail (Universitas Muhammadiyah Pontianak, Pontianak, Indonesia)
 Izhan Fakhruzi (Universitas Muhammadiyah Pontianak, Pontianak, Indonesia)

(*) Corresponding Author

Submitted: May 11, 2023; Published: October 22, 2023

Abstract

Heart disease is the main cause of death in the world. To reduce this high mortality rate, accurate prediction capabilities are needed in warning people with heart disease to prevent and manage this condition. This study uses a machine learning model to predict heart disease. The purpose of this research is to improve the ability of a machine learning classification model, namely Logistic Regression (LR), in predicting heart disease. So that prediction errors that can harm patients can be significantly reduced. To achieve this goal, research is carried out using two important approaches, namely data preparation and model optimization. At the data preparation stage, data imbalance problems were found between people with heart disease and non-heart disease sufferers. To deal with this problem, the Neighborhood Cleaning Rule (NCL) algorithm is used to correct data imbalances. The use of NCL in the data preparation stage has a significant impact on improving the performance of the prediction model. Furthermore, at the model optimization stage, the GridSearchCV method is used to find the best hyperparameter combination in the Logistic Regression (LR) algorithm. By finding optimal hyperparameters, the performance of the prediction model can be improved. In addition, this study also implemented Weighted Logistic Regression which allows setting class weights, which also contributes to improving model performance. The results of testing the model using the evaluation metrics Accuracy, Recall, and Area Under Curve (AUC) show an increase in the ability of the model. The recall score increased from 0.10 to 0.93, and the AUC score increased from 0.83 to 0.98. This study used a dataset obtained from Kaggle from the Centers for Disease Control and Prevention (CDC). With better predictive ability in identifying heart disease, it is hoped that it can provide accurate early warning to individuals at risk, thereby significantly reducing mortality from heart disease.

Keywords


Heart Disease; GridSearchCV; Weighted Logistic Regression; Neighborhood Cleaning Rule; Machine Learning

Full Text:

PDF


Article Metrics

Abstract view : 263 times
PDF - 133 times

References

R. Annisa, “ANALISIS KOMPARASI ALGORITMA KLASIFIKASI DATA MINING UNTUK PREDIKSI PENDERITA PENYAKIT JANTUNG,” J. Tek. Inform. Kaputama, vol. 3, pp. 22–28, 2019.

and S. S. M. N, R. A, R. S., “Karakteristik Dan Prevalensi Risiko Penyakit Kardiovaskular Pada Tukang Masak Warung Makan Di Wilayah Kerja Puskesmas Tamalanrea,” J. Kesehat., vol. 11, pp. 30–38, 2018.

A. I. Ayu, E. Widiastuti, R. Cholidah, G. W. Buanayuda and I. B. Alit, “Deteksi Dini Faktor Risiko Penyakit Kardiovaskuler pada Pegawai Rektorat Universitas Mataram,” J. Pengabdi. Magister Pendidik. IPA, vol. 4, pp. 137–142, 2021.

Kemenkes RI, “Cardiovasular Diseases Guideline,” p. 32, 2009.

A. B. Wibisono and A. Fahrurozi, “Perbandingan Algoritma Klasifikasi Dalam Pengklasifikasian Data Penyakit Jantung Koroner,” J. Ilm. Teknol. dan Rekayasa,” vol. 24, pp. 161–170, 2019.

C. Krittanawong et al., “Machine learning prediction in cardiovascular diseases: a meta-analysis,” Sci. Rep., vol. 10, no. 1, pp. 1–11, 2020, doi: 10.1038/s41598-020-72685-1.

I. Fakhruzi, “An artificial neural network with bagging to address imbalance datasets on clinical prediction,” 2018 Int. Conf. Inf. Commun. Technol. ICOIACT 2018, vol. 2018-Janua, no. 1, pp. 895–898, 2018, doi: 10.1109/ICOIACT.2018.8350824.

A. Fernández, S. García, M. Galar, R. C. Prati, B. Krawczyk, and F. Herrera, Learning From Imbalanced Data Sets. Springer US, 2018. doi: 10.1007/978-3-319-98074-4_5.

S. Uddin, A. Khan, M. E. Hossain, and M. A. Moni, “Comparing different supervised machine learning algorithms for disease prediction,” BMC Med. Inform. Decis. Mak., vol. 19, no. 1, pp. 1–16, 2019, doi: 10.1186/s12911-019-1004-8.

P. D. Putra and D. P. Rini, “Prediksi Penyakit Jantung dengan Algoritma Klasifikasi,” Pros. Annu. Res. Semin. 2019, vol. 5, no. 1, pp. 978–979, 2019.

F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res., vol. 12, no. 85, pp. 2825–2830, 2011, [Online]. Available: http://jmlr.org/papers/v12/pedregosa11a.html

Z. M. Alhakeem, Y. M. Jebur, S. N. Henedy, H. Imran, L. F. A. Bernardo, and H. M. Hussein, “Prediction of Ecofriendly Concrete Compressive Strength Using Gradient Boosting Regression Tree Combined with GridSearchCV Hyperparameter-Optimization Techniques,” Materials (Basel)., vol. 15, no. 21, p. 7432, 2022, doi: 10.3390/ma15217432.

“Personal Key Indicators of Heart Disease,” 2022.

Anikakapoor, “ML | Data Preprocessing in Python,” GeeksforGeeks, 2023.

F. S. Pamungkas, B. D. Prasetya, and I. Kharisudin, “Perbandingan Metode Klasifikasi Supervised Learning pada Data Bank Customers Menggunakan Python,” Prism. Pros. Semin. Nas. Mat., vol. 3, pp. 692–697, 2020.

R. Ordila, R. Wahyuni, Y. Irawan, and M. Yulia Sari, “PENERAPAN DATA MINING UNTUK PENGELOMPOKAN DATA REKAM MEDIS PASIEN BERDASARKAN JENIS PENYAKIT DENGAN ALGORITMA CLUSTERING (Studi Kasus : Poli Klinik PT.Inecda),” J. Ilmu Komput., vol. 9, no. 2, pp. 148–153, 2020, doi: 10.33060/jik/2020/vol9.iss2.181.

C. A. Ramezan, “Transferability of Recursive Feature Elimination (RFE)-Derived Feature Sets for Support Vector Machine Land Cover Classification,” Remote Sens., vol. 14, no. 24, 2022, doi: 10.3390/rs14246218.

S. Farhad Khorshid and A. Mohsin Abdulazeez, “Breast Cancer Diagnosis Based on K-Nearest Neighbors: a Review,” J. Archaeol. Egypt/Egyptology, vol. 18, no. 4, pp. 1927–1951, 2021.

R. D. King, O. I. Orhobor, and C. C. Taylor, “Cross-validation is safe to use,” Nat. Mach. Intell., vol. 3, no. 4, p. 276, 2021, doi: 10.1038/s42256-021-00332-z.

D. P. Utomo and M. Mesran, “Analisis Komparasi Metode Klasifikasi Data Mining dan Reduksi Atribut Pada Data Set Penyakit Jantung,” J. Media Inform. Budidarma, vol. 4, no. 2, p. 437, 2020, doi: 10.30865/mib.v4i2.2080.

T. Gneiting and E. M. Walz, “Receiver operating characteristic (ROC) movies, universal ROC (UROC) curves, and coefficient of predictive ability (CPA),” Mach. Learn., vol. 111, no. 8, pp. 2769–2797, 2022, doi: 10.1007/s10994-021-06114-3.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Meningkatkan Kemampuan Model dalam Memprediksi Penyakit Jantung dengan Algoritma NCL dan GridSearchCV

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 JURNAL MEDIA INFORMATIKA BUDIDARMA

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.



JURNAL MEDIA INFORMATIKA BUDIDARMA
STMIK Budi Darma
Secretariat: Sisingamangaraja No. 338 Telp 061-7875998
Email: mib.stmikbd@gmail.com

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.