Penggunaan Algoritma Stacking Classifier Pada Sistem Deteksi Risiko Kardiovaskular
DOI:
https://doi.org/10.30865/jurikom.v12i6.9402Keywords:
Cardiovascular Disease, Stacking Classifier, K-Modes Clustering, Stratified Cross-Validation, Computational EfficiencyAbstract
Cardiovascular disease is a leading cause of global death. However, the complexity of medical data often makes conventional models fail to capture hidden patterns, resulting in suboptimal predictive performance. This study evaluates the effectiveness of a hybrid model that integrates K-Modes Clustering with the Stacking Classifier algorithm and tests whether the model's complexity can provide significant performance improvements compared to a single model. The methodology involves data preprocessing including outlier handling, clinical feature engineering, and cluster feature extraction using K-Modes (K=2). The Stacking Classifier architecture is built using five optimized heterogeneous base-learners (CatBoost, Decision Tree, MLP, SVC, Logistic Regression) and XGBoost as a meta-learner, validated through Stratified 5-Fold Cross-Validation. The results showed that although K-Modes effectively mapped clinically valid risk categories, the Stacking Classifier model (87.99% accuracy and 95.89% ROC-AUC) was not able to surpass the performance of the best single model, namely CatBoost (88.03% accuracy and 95.90% ROC-AUC). The most significant finding lies in the computational time efficiency, where the Stacking Classifier algorithm required 560 times longer computational time (7587.7686 seconds) than CatBoost (13.4635 seconds) without providing a commensurate performance improvement. This indicates that Boosting-based algorithms are able to capture complex patterns without requiring additional ensemble layers, so that an optimized single model is more recommended for real-world implementations by providing the best balance between prediction accuracy and computational time efficiency.
References
[1] WHO, “Cardiovascular diseases (CVDs),” www.who.int. [Daring]. Tersedia pada: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)
[2] Institute for Health Metrics and Evaluation, “Global Burden of Disease Study 2023 (GBD 2023) Cause-Specific Mortality 1990-2023,” Global Health Data Exchange.
[3] D. Kasartzian, “Transforming Cardiovascular Risk Prediction : A Review of Machine Learning and Artificial Intelligence Innovations,” 2025.
[4] M. R. Sajid et al., “Development of Nonlaboratory-Based Risk Prediction Models for Cardiovascular Diseases Using Conventional and Machine Learning Approaches,” 2021.
[5] S. F. Weng, J. Reps, J. Kai, J. M. Garibaldi, dan N. Qureshi, “Can machine-learning improve cardiovascular risk prediction using routine clinical data ?,” vol. 355, hal. 1–14, 2017.
[6] M. Imran, S. A. Ramay, dan T. Abbas, “Ensemble-Based Machine Learning for Early Detection and Risk Prediction of Cardiovascular Diseases,” vol. 31, no. 9, hal. 3469–3486, 2024, doi: 10.53555/rb215y02.
[7] D. H. Wolpert, “Stacked generalization,” Neural Networks, vol. 5, no. 2, hal. 241–259, 1992, doi: 10.1016/S0893-6080(05)80023-1.
[8] J. T. Hancock dan T. M. Khoshgoftaar, “CatBoost for big data : an interdisciplinary review,” J. Big Data, 2020, doi: 10.1186/s40537-020-00369-8.
[9] J. M. Luna et al., “Building more accurate decision trees with the additive tree,” Proc. Natl. Acad. Sci. U. S. A., vol. 116, no. 40, hal. 19887–19893, 2019, doi: 10.1073/pnas.1816748116.
[10] J. Liu, X. Dong, H. Zhao, dan Y. Tian, “Predictive Classifier for Cardiovascular Disease Based on Stacking Model Fusion,” 2022.
[11] T. Chen dan C. Guestrin, “XGBoost: A scalable tree boosting system,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., vol. 13-17-Augu, hal. 785–794, 2016, doi: 10.1145/2939672.2939785.



