Enhancing Machine Learning Accuracy in Detecting Preventable Diseases using Backward Elimination Method
DOI:
https://doi.org/10.30865/mib.v8i1.7073Keywords:
Feature Selection, Backward Elimination, Machine Learning Algorithms, Disease Detection, KNNAbstract
In the current landscape of abundant high-dimensional datasets, addressing classification challenges is pivotal. While prior studies have effectively utilized Backward Elimination (BE) for disease detection, there is a notable absence of research demonstrating the method's significance through comprehensive comparisons across diverse databases. The study aims to extend its contribution by applying BE across multiple machine learning algorithms (MLAs)—Naïve Bayes (NB), k-Nearest Neighbors (KNN), and Support Vector Machine (SVM)—on datasets associated with preventable diseases (i.e. heart failure (HF), breast cancer (BC), and diabetes). This study aims to elucidate and recommend significant differences observed in the application of BE across diverse datasets and machine learning (ML) methods. This study conducted testing on four distinct datasets—raisin, HF, BC, and early-stage diabetes risk prediction datasets. Each dataset underwent evaluation with three MLAs: NB, KNN, and SVM. The application of BE successfully eliminated non-significant attributes, retaining only influential ones in the model. In addition, t-test results revealed a significant impact on accuracy across all datasets (p-value < 0.05). In specific algorithmic evaluations, SVM exhibited the highest accuracy for the raisin dataset at 87.22%. Additionally, KNN attained the utmost accuracy in the heart failure dataset with an accuracy of 86.31%. In the breast cancer dataset, KNN again excelled, achieving an accuracy of 83.56%. For the diabetes dataset, KNN proved the most accurate, reaching 96.15%. These results underscore the efficacy of BE in enhancing the execution of MLAs for disease detection.
References
M. Gjoreski, M. Simjanoska, A. Gradišek, A. Peterlin, M. Gams, and G. Poglajen, “Chronic heart failure detection from heart sounds using a stack of machine-learning classifiers.,” in 2017 International Conference on Intelligent Environments (IE), 2017, pp. 14–19. doi: https://doi.org/10.1109/IE.2017.19.
F. S. Alotaibi, “Implementation of Machine Learning Model to Predict Heart Failure Disease,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 6, 2019, doi: 10.14569/IJACSA.2019.0100637.
S. A. Mohammed, S. Darrab, S. A. Noaman, and G. Saake, “Analysis of Breast Cancer Detection Using Different Machine Learning Techniques,” 2020, pp. 108–117. doi: 10.1007/978-981-15-7205-0_10.
D. A. Omondiagbe, S. Veeramani, and A. S. Sidhu, “Machine Learning Classification Techniques for Breast Cancer Diagnosis,” IOP Conf. Ser. Mater. Sci. Eng., vol. 495, p. 012033, Jun. 2019, doi: 10.1088/1757-899X/495/1/012033.
L. Kopitar, P. Kocbek, L. Cilar, A. Sheikh, and G. Stiglic, “Early detection of type 2 diabetes mellitus using machine learning-based prediction models,” Sci. Rep., vol. 10, no. 1, p. 11981, Jul. 2020, doi: 10.1038/s41598-020-68771-z.
M. Kubat, An Introduction to Machine Learning. Cham: Springer International Publishing, 2017. doi: 10.1007/978-3-319-63913-0.
R. C. Thom de Souza, C. A. de Macedo, L. dos Santos Coelho, J. Pierezan, and V. C. Mariani, “Binary coyote optimization algorithm for feature selection,” Pattern Recognit., vol. 107, p. 107470, Nov. 2020, doi: 10.1016/j.patcog.2020.107470.
T. Nyathi and N. Pillay, “Comparison of a genetic algorithm to grammatical evolution for automated design of genetic programming classification algorithms,” Expert Syst. Appl., vol. 104, pp. 213–234, Aug. 2018, doi: 10.1016/j.eswa.2018.03.030.
E. Odhiambo Omuya, G. Onyango Okeyo, and M. Waema Kimwele, “Feature Selection for Classification using Principal Component Analysis and Information Gain,” Expert Syst. Appl., vol. 174, p. 114765, Jul. 2021, doi: 10.1016/j.eswa.2021.114765.
T. H. Nguyen, K. Shirai, and J. Velcin, “Sentiment analysis on social media for stock movement prediction,” Expert Syst. Appl., vol. 42, no. 24, pp. 9603–9611, Dec. 2015, doi: 10.1016/j.eswa.2015.07.052.
S. Arora, H. Singh, M. Sharma, S. Sharma, and P. Anand, “A New Hybrid Algorithm Based on Grey Wolf Optimization and Crow Search Algorithm for Unconstrained Function Optimization and Feature Selection,” IEEE Access, vol. 7, pp. 26343–26361, 2019, doi: 10.1109/ACCESS.2019.2897325.
R. C. T. De Souza, L. dos S. Coelho, C. A. De Macedo, and J. Pierezan, “A V-Shaped Binary Crow Search Algorithm for Feature Selection,” in 2018 IEEE Congress on Evolutionary Computation (CEC), Jul. 2018, pp. 1–8. doi: 10.1109/CEC.2018.8477975.
C. C. Aggarwal, X. Kong, Q. Gu, J. Han, and P. S. Yu, “Active learning: A survey,” Data Classification: Algorithms and Applications, pp. 571–605, 2014, doi: 10.1201/b17320.
A. Katrutsa and V. Strijov, “Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria,” Expert Syst. Appl., vol. 76, pp. 1–11, Jun. 2017, doi: 10.1016/j.eswa.2017.01.048.
A. Zarshenas and K. Suzuki, “Binary coordinate ascent: An efficient optimization technique for feature subset selection for machine learning,” Knowledge-Based Syst., vol. 110, pp. 191–201, Oct. 2016, doi: 10.1016/j.knosys.2016.07.026.
X. Zhu, S. Zhang, R. Hu, Y. Zhu, and J. Song, “Local and Global Structure Preservation for Robust Unsupervised Spectral Feature Selection,” IEEE Trans. Knowl. Data Eng., vol. 30, no. 3, pp. 517–529, Mar. 2018, doi: 10.1109/TKDE.2017.2763618.
M. Z. I. Chowdhury and T. C. Turin, “Variable selection strategies and its importance in clinical prediction modelling,” Fam. Med. Community Heal., vol. 8, no. 1, p. e000262, Feb. 2020, doi: 10.1136/fmch-2019-000262.
F. Maulidina, Z. Rustam, S. Hartini, V. V. P. Wibowo, I. Wirasati, and W. Sadewo, “Feature optimization using Backward Elimination and Support Vector Machines (SVM) algorithm for diabetes classification,” J. Phys. Conf. Ser., vol. 1821, no. 1, p. 012006, Mar. 2021, doi: 10.1088/1742-6596/1821/1/012006.
S. Farahdiba, D. Kartini, R. A. Nugroho, R. Herteno, and T. H. Saragih, “Backward Elimination for Feature Selection on Breast Cancer Classification Using Logistic Regression and Support Vector Machine Algorithms,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 17, no. 4, p. 429, Oct. 2023, doi: 10.22146/ijccs.88926.
M. Arifin, “Naïve Bayes Algorithm Based On Backward Elimination For Predicting Cervical Cancer,” Int. J. Innov. Sci. Res. Technol., vol. 7, no. 7, pp. 1–3, 2022.
N. Bodasingi, N. Balaji, and B. R. Jammu, “Automatic diagnosis of pneumonia using backward elimination method based SVM and its hardware implementation,” Int. J. Imaging Syst. Technol., vol. 32, no. 3, pp. 1000–1014, May 2022, doi: 10.1002/ima.22694.
S. Karthika and N. Sairam, “A Naïve Bayesian Classifier for Educational Qualification,” Indian J. Sci. Technol., vol. 8, no. 16, Jul. 2015, doi: 10.17485/ijst/2015/v8i16/62055.
V. Kumar, “Feature Selection: A literature Review,” Smart Comput. Rev., vol. 4, no. 3, Jun. 2014, doi: 10.6029/smartcr.2014.03.007.
Y. Isler, U. Ozturk, and E. Sayilgan, “A new sample reduction method for decreasing the running time of the k-nearest neighbors algorithm to diagnose patients with congestive heart failure: backward iterative elimination,” S?dhan?, vol. 48, no. 2, p. 35, Mar. 2023, doi: 10.1007/s12046-023-02105-3.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).