Improving Infant Cry Recognition with CNNs and Imbalance Mitigation

Authors

  • Michael Indrawan Universitas Dian Nuswantoro, Semarang
  • Ardytha Luthfiarta Universitas Dian Nuswantoro, Semarang
  • Muhammad Daffa Al Fahreza Universitas Dian Nuswantoro, Semarang
  • Muhammad Rafid Universitas Dian Nuswantoro, Semarang

DOI:

https://doi.org/10.30865/mib.v8i2.7370

Keywords:

Baby Cry Classification, Neural Network, Handling Data Imbalance, Audio Analysis

Abstract

The classification of baby cries using machine learning is essential for developing automated systems that can assist caregivers in identifying and responding to the needs of infants promptly and accurately. This study aims to improve upon previous research relating to the Cry Baby Dataset, which has highly imbalanced data. We combine oversampling and undersampling techniques using SMOTE and ENN, along with data augmentation through pitch shifting and noise addition to address the data imbalance issue. The processed data was then modeled using Convolutional Neural Networks (CNN). The study yielded an overall accuracy of 88%, with balanced accuracy observed across all classes, effectively mitigating data imbalance. This represents a notable advancement compared to previous research, which often encountered challenges with unbalanced accuracies across classes. The classes identified include recordings of baby cries attributed to belly pain caused by colic, recordings related to burping, recordings associated with discomfort or other symptoms, recordings of hungry baby cries, and recordings indicating fatigue or the need for sleep. This shows a significant improvement from previous studies, which had very unbalanced accuracy for each class.

References

M. Viragova and S. O’Curry, “Understanding persistent crying in infancy,†Paediatr Child Health, vol. 31, Jan. 2021, doi: 10.1016/j.paed.2020.12.004.

S. Dewi, A. Prasasti, and B. Irawan, “The Study of Baby Crying Analysis Using MFCC and LFCC in Different Classification Methods,†Jan. 2019. doi: 10.1109/ICSIGSYS.2019.8811070.

A. Prayogi, M. Rizqi, and T. M. Fahrudin, “Klasifikasi Suara Tangisan Bayi Berdasarkan Prosodic Features Menggunakan Metode Moments of Distribution dan K-Nearest Neighbours,†vol. 8, pp. 119–125, Jan. 2019, doi: 10.34148/teknika.v8i2.206.

S. Yusdiantoro and T. Sasongko, “Implementasi Algoritma MFCC dan CNN dalam Klasifikasi Makna Tangisan Bayi,†Indonesian Journal of Computer Science, vol. 12, Jan. 2023, doi: 10.33022/ijcs.v12i4.3243.

L. Alzubaidi et al., “Review of deep learning: concepts, CNN architectures, challenges, applications, future directions,†J Big Data, vol. 8, no. 1, p. 53, 2021, doi: 10.1186/s40537-021-00444-8.

M. Wardana and M. Wibowo, “Audio-Visual CNN using Transfer Learning for TV Commercial Break Detection,†IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 17, p. 291, Jan. 2023, doi: 10.22146/ijccs.76058.

Y. Zayed, A. Hasasneh, and C. Tadj, “Infant Cry Signal Diagnostic System Using Deep Learning and Fused Features,†Diagnostics, vol. 13, p. 2107, Jan. 2023, doi: 10.3390/diagnostics13122107.

N. Azhar, M. S. Mohd Pozi, A. Mohamed Din, and A. Jatowt, “An Investigation of SMOTE based Methods for Imbalanced Datasets with Data Complexity Analysis,†IEEE Trans Knowl Data Eng, vol. 35, pp. 6651–6672, Jan. 2023, doi: 10.1109/TKDE.2022.3179381.

A. Alsabry, M. Algabri, A. Ahsan, M. Mosleh, A. Ahmed, and H. A. Qasem, “Enhancing Prediction Models’ Performance for Breast Cancer using SMOTE Technique,†Jan. 2023, pp. 1–8. doi: 10.1109/eSmarTA59349.2023.10293726.

Z. Shi, “Improving k-Nearest Neighbors Algorithm for Imbalanced Data Classification,†IOP Conf Ser Mater Sci Eng, vol. 719, p. 12072, Jan. 2020, doi: 10.1088/1757-899X/719/1/012072.

J. Zhai, J. Qi, and S. Zhang, “An instance selection algorithm for fuzzy K-nearest neighbor,†Journal of Intelligent & Fuzzy Systems, vol. 40, pp. 1–13, Jan. 2020, doi: 10.3233/JIFS-200124.

C. Bratan et al., “Dunstan Baby Language Classification with CNN,†Jan. 2021, pp. 167–171. doi: 10.1109/SpeD53181.2021.9587374.

Z. Kh. Abdul and A. K. Al-Talabani, “Mel Frequency Cepstral Coefficient and its Applications: A Review,†IEEE Access, vol. 10, pp. 122136–122158, 2022, doi: 10.1109/ACCESS.2022.3223444.

O. Özhan, “Fast Fourier Transform,†2022, pp. 465–494. doi: 10.1007/978-3-030-98846-3_8.

A. M S and S. P S, “Mel Scale-Based Linear Prediction Approach to Reduce the Prediction Filter Order in CELP Paradigm,†Circuits Syst Signal Process, vol. 40, pp. 1–23, Jan. 2021, doi: 10.1007/s00034-021-01647-3.

A. Salau, I. Oluwafemi, K. Faleye, and S. Jain, “Audio Compression Using a Modified Discrete Cosine Transform with Temporal Auditory Masking,†Jan. 2019, pp. 135–142. doi: 10.1109/ICSC45622.2019.8938213.

B. McFee et al., “librosa: Audio and Music Signal Analysis in Python,†Jan. 2015, pp. 18–24. doi: 10.25080/Majora-7b98e3ed-003.

A. F. Agarap, “Deep Learning using Rectified Linear Units (ReLU),†Jan. 2018.

I. Kouretas and V. Paliouras, “Hardware Implementation of a Softmax-Like Function for Deep Learning,†Technologies (Basel), vol. 8, no. 3, 2020, doi: 10.3390/technologies8030046.

X. Xie, P. Zhou, H. Li, Z. Lin, and S. Yan, “Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models.†Jan. 2022.

Q. Zhang et al., “Boosting Adversarial Attacks with Nadam Optimizer,†Electronics (Basel), vol. 12, p. 1464, Jan. 2023, doi: 10.3390/electronics12061464.

G. Lemaître, F. Nogueira, and C. Aridas, “Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning,†vol. 18, Jan. 2016.

Downloads

Published

2024-04-30

Issue

Section

Articles