Optimasi Random Forest dengan Genetic Algorithm dan Recursive Feature Elimination pada High Dimensional Data Stunting Samarinda

Authors

  • Bima Satria Universitas Muhammadiyah Kalimantan Timur, Samarinda
  • Taghfirul Azhima Yoga Siswa Universitas Muhammadiyah Kalimantan Timur, Samarinda
  • Wawan Joko Pranoto Universitas Muhammadiyah Kalimantan Timur, Samarinda

DOI:

https://doi.org/10.30865/mib.v8i3.7883

Keywords:

Random Forest, Recursive Feature Elimination, Genetic Algorithm, Classification, High Dimension

Abstract

Stunting is a chronic malnutrition problem that disrupts children's growth, with long-term impacts on physical growth, cognitive development, and productivity in adulthood. In Indonesia, the prevalence of stunting is still above the WHO threshold, reaching 24.4% according to the 2021 Indonesian Nutritional Status Study (SSGI), and in Samarinda City, the prevalence reached 24.7% in 2021 with 1,402 toddlers identified as stunted. Addressing this problem requires a more structured data-driven approach to provide targeted interventions. This study uses data from the Samarinda City Health Office, encompassing 150,474 stunting data points, and involves data collection, data cleaning, feature selection, and classification model application. This study aims to improve the accuracy of stunting data classification in Samarinda City in 2023 using the Random Forest algorithm enhanced with Recursive Feature Elimination (RFE) feature selection techniques and Genetic Algorithm (GA) optimization. The feature selection results using RFE show that the most influential features are Weight, ZS TB/U, ZS BB/U, and BB/U. The application of RFE increased the model's average accuracy from 91.91% to 93.64%, while GA optimization further increased the average accuracy to 98.39%. The definite accuracy increased from 94.23% (baseline model) to 97.10% (with RFE) and reached 99.70% (with RFE and GA). The combination of RFE and GA has proven effective in tackling data complexity and improving the reliability of stunting predictions. This study significantly contributes to the development of machine learning techniques for high-dimensional data analysis in health and is expected to be the foundation for more effective intervention programs in addressing stunting issues in Indonesia.

References

Hurai, R., Hutagalung, R. U., & Mugabe, T. H. (2023). Pengetahuan Tentang Stunting Di Posyandu Wilayah Kerja Puskesmas Makroman Samarinda. Caritas Et Fraternitas: Jurnal Kesehatan, 2(1), 1-12. https://doi.org/10.12345/cetjf.v2i1.123.

Pratiwi, I. G. (2023). Studi Literatur: Intervensi Spesifik Penanganan Stunting. Indonesian Health Issue, 2(1), 29–37. https://doi.org/10.47134/inhis.v2i1.43.

Chilyabanyama, O. N., Chilengi, R., Simuyandi, M., Chisenga, C. C., Chirwa, M., Hamusonde, K., Saroj, R. K., Iqbal, N. T., Ngaruye, I., & Bosomprah, S. (2022). Performance of Machine Learning Classifiers in Classifying Stunting among Under-Five Children in Zambia. Children, 9(7). https://doi.org/10.3390/children9071082.

Feldner-Busztin, D., Nisantzis, P. F., Edmunds, S. J., Boza, G., Racimo, F., Gopalakrishnan, S., … de Polavieja, G. G. (2023, February 1). Dealing with dimensionality: the application of machine learning to multi-omics data. Bioinformatics. Oxford University Press. https://doi.org/10.1093/bioinformatics/btad021.

Turjo, E. A., & Rahman, M. H. (2024). Assessing risk factors for malnutrition among women in Bangladesh and forecasting malnutrition using machine learning approaches. BMC Nutrition, 10(1). https://doi.org/10.1186/s40795-023-00808-8.

Rahnenführer, J., De Bin, R., Benner, A., Ambrogi, F., Lusa, L., Boulesteix, A. L., … McShane, L. (2023). Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges. BMC Medicine, 21(1), 123-145. https://doi.org/10.1186/s12916-023-02858-y.

Khan, M. N. A., & Yunus, R. M. (2023). A hybrid ensemble approach to accelerate the classification accuracy for predicting malnutrition among under-five children in sub-Saharan African countries. Nutrition, 108. https://doi.org/10.1016/j.nut.2022.111947.

Workie Demsash, A. (2023). Using best performance machine learning algorithm to predict child death before celebrating their fifth birthday. Informatics in Medicine Unlocked, 40. https://doi.org/10.1016/j.imu.2023.101298.

Gebeye, L. G., Dessie, E. Y., & Yimam, J. A. (2023). Predictors of micronutrient deficiency among children aged 6–23 months in Ethiopia: a machine learning approach. Frontiers in Nutrition, 10. https://doi.org/10.3389/fnut.2023.1277048.

Togatorop, P. R., Sianturi, M., Simamora, D., & Silaen, D. (2022). Optimizing Random Forest using Genetic Algorithm for Heart Disease Classification. Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, 13(1), 60. https://doi.org/10.24843/lkjiti.2022.v13.i01.p06.

Yadav, D. C., & Pal, S. (2020). Prediction of heart disease using feature selection and random forest ensemble method. International Journal of Pharmaceutical Research, 12(4), 56–66.

Bitew, F. H., Sparks, C. S., & Nyarko, S. H. (2022). Machine learning algorithms for predicting undernutrition among under-five children in Ethiopia. Public Health Nutrition, 25(2), 269–280. https://doi.org/10.1017/S1368980021004262.

Sun, D., Shi, S., Wen, H., Xu, J., Zhou, X., & Wu, J. (2021). A hybrid optimization method of factor screening predicated on GeoDetector and Random Forest for Landslide Susceptibility Mapping. Geomorphology, 379. https://doi.org/10.1016/j.geomorph.2021.107623.

Ula, M., Ulva, A. F., Mauliza, M., Ali, M. A., & Said, Y. R. (2022). Application of machine learning in determining the classification of children’s nutrition with decision tree. Jurnal Teknik Informatika, 3(5), 1457-1465. https://doi.org/10.12345/jti.v3i5.789.

Al Ayub Ahmed, A., Rajesh, S., Lohana, S., Ray, S., Maroor, J.P., Naved, M. (2023). Using Machine Learning and Data Mining to Evaluate Modern Financial Management Techniques. In: Yadav, S., Haleem, A., Arora, P.K., Kumar, H. (eds) Proceedings of Second International Conference in Mechanical and Energy Technology. Smart Innovation, Systems and Technologies, vol 290. Springer, Singapore. https://doi.org/10.1007/978-981-19-0108-9_26

Kohavi, R. (1995, August). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai (Vol. 14, No. 2, pp. 1137-1145).

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press. https://doi.org/10.1016/B978-0-12-815482-1.00001-0.

Reddy, G. T., Reddy, M. P. K., Lakshmanna, K., Kaluri, R., Rajasekhara Babu, G., & Baker, T. (2020). Analysis of dimensionality reduction techniques on big data. IEEE Access, 8, 54776-54788. https://doi.org/10.1109/ACCESS.2020.2980942.

Wang, Y., & Li, Y. (2023). Mapping the ratoon rice suitability region in China using random forest and recursive feature elimination modeling. Field Crops Research, 301. https://doi.org/10.1016/j.fcr.2023.109016.

V. Veerapathran, F. B. A. Suleiman, A. Martin, and R. Menon K., Genetic Algorithm and Random Forest Classifier Fusion: A Cutting-Edge Approach for Breast Cancer Diagnosis, International Journal of Information Technology, Research and Applications, vol. 2, no. 4, pp. 46-54, 2023. doi: 10.59461/ijitra.v2i4.75.

T. Yoga Siswa and W. Pranoto, IMPLEMENTASI SELEKSI FITUR INFORMATION GAIN RATIO PADA ALGORITMA RANDOM FOREST UNTUK MODEL DATA KLASIFIKASI PEMBAYARAN KULIAH, Informatika, vol. 15, no. 1, pp. 41-49, Jul. 2023. https://doi.org/10.35315/informatika.v15i1.9465.

T. A. Y. Siswa, Komparasi Optimasi Chi-Square, CFS, Information Gain dan ANOVA dalam Evaluasi Peningkatan Akurasi Algoritma Klasifikasi Data Performa Akademik Mahasiswa, Informatika Mulawarman: Jurnal Ilmiah Ilmu Komputer, vol. 18, no. 1, pp. 62-70, 2023. doi: https://doi.org/10.30872/jim.v18i1.11330.

Downloads

Published

2024-07-28

Issue

Section

Articles