Review: Metode-Metode Ekstraksi Ciri dan Klasifikasi Identifikasi Pembicara

Authors

DOI:

https://doi.org/10.30865/mib.v6i1.3469

Keywords:

Speaker Identification, MFCC, GMM, Hybrid Classifier

Abstract

Identifying a person's identity still often uses an ID card (KTP, SIM, passport, etc.). This method has a weakness because the ID Card is easily damaged and lost. Biometric recognition systems provide a solution by using human body parts as identity recognition. Sounds are readily available biometric information. Voice pattern recognition is used for the speaker identification process to obtain the identity of someone speaking. This paper reviews several feature extraction and classification methods that are often used in speaker identification. The selection of feature extraction methods and classification functions in computation and the level of accuracy of the speaker identification system. Based on the survey dataset applied with the feature extraction method, the Mel Frequency Cepstral Coefficients (MFCC) method has high accuracy even with noise input. Then in classification, the Gaussian Mixture Model (GMM) method is most often used because it can work in noise. Recently, a hybrid classifier has been developed, which increases the accuracy value.

References

R. Togneri and D. Pullella, “An overview of speaker identification: Accuracy and robustness issues,†IEEE Circuits and Systems Magazine, vol. 11, no. 2, pp. 23–61, 2011, doi: 10.1109/MCAS.2011.941079.

A. H. Rasmussen and D. B. Mikalski, “Speaker Identification,†Technical University of Denmark, 2007.

L. Feng, “Speaker Recognition,†Technical University of Denmark, 2004.

S. S. Tirumala, S. R. Shahamiri, A. S. Garhwal, and R. Wang, “Speaker identification features extraction methods: A systematic review,†Expert Systems with Applications, vol. 90, pp. 250–271, 2017, doi: 10.1016/j.eswa.2017.08.015.

A. Maurya, D. Kumar, and R. K. Agarwal, “Speaker Recognition for Hindi Speech Signal using MFCC-GMM Approach,†Procedia Computer Science, vol. 125, pp. 880–887, 2018, doi: 10.1016/j.procs.2017.12.112.

K. Daqrouq and T. A. Tutunji, “Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers,†Applied Soft Computing Journal, vol. 27, pp. 231–239, 2015, doi: 10.1016/j.asoc.2014.11.016.

S. V. Chougule and M. S. Chavan, “Robust Spectral Features for Automatic Speaker Recognition in Mismatch Condition,†Procedia Computer Science, vol. 58, pp. 272–279, 2015, doi: 10.1016/j.procs.2015.08.021.

Z. Weng, L. Li, and D. Guo, “Speaker recognition using weighted dynamic MFCC based on GMM,†Proceedings - 2010 International Conference on Anti-Counterfeiting, Security and Identification, 2010 ASID, pp. 285–288, 2010, doi: 10.1109/ICASID.2010.5551341.

A. Shahab and D. Lestari, “An investigation of Indonesian speaker identification for channel dependent modeling using I-vector,†2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2016, no. October, pp. 151–155, 2017, doi: 10.1109/ICSDA.2016.7919002.

L. Zhu and Q. Yang, “Speaker Recognition System Based on weighted feature parameter,†Physics Procedia, vol. 25, pp. 1515–1522, 2012, doi: 10.1016/j.phpro.2012.03.270.

L. M. Yee and A. M. Ahmad, “Comparative Study of Speaker Recognition Methods :DTW,GMM and SVM,†2008.

N. Mohan, “GMM-UBM for Text-Dependent Speaker Recognition,†IEEE, pp. 432–435, 2012, doi: 10.1109/ICALIP.2012.6376656.

P. K. Nayana, D. Mathew, and A. Thomas, “Comparison of Text Independent Speaker Identification Systems using GMM and i-Vector Methods,†Procedia Computer Science, vol. 115, pp. 47–54, 2017, doi: 10.1016/j.procs.2017.09.075.

D. Pandey, “Implementation of DTW Algorithm for Voice Recognition using VHDL,†pp. 1–4, 2017.

S. B. Magre, R. R. Deshmukh, and P. P. Shrishrimal, “A comparative study on feature extraction techniques in speech recognition,†no. June, 2013, doi: 10.1007/s40012-015-0063-y.

M. Subali, M. Andriansyah, and C. Sinambela, “Analisis Frekuensi Dasar dan Frekuensi Formant dari Fonem Huruf Hijaiyah untuk Pengucapan Makhraj dengan Metode DTW,†Prosiding PESAT (Psikologi, Ekonomi, Sastra, Arsitektur &Teknik Sipil), vol. 6, pp. 60–73, 2015.

S. Srivastava, P. Nandi, G. Sahoo, and M. Chandra, “Formant Based Linear Prediction Coefficients for Speaker Identification,†International Conference on Signal Processing and Integrated Networks (SPIN), pp. 685–688, 2014.

N. Almaadeed, A. Aggoun, and A. Amira, “Text-Independent Speaker Identification Using Vowel Formants,†Journal of Signal Processing Systems, vol. 82, no. 3, pp. 345–356, 2016, doi: 10.1007/s11265-015-1005-5.

P. J. Chaudhary and K. M. Vagadia, “A Review Article on Speaker Recognition with Feature Extraction,†International Journal of Emerging Technology and Advanced Engineering, vol. 5, no. 2, pp. 94–97, 2015.

K. Kaur and N. Jain, “Feature Extraction and Classification for Automatic Speaker Recognition System: A Review,†International Journal of Advances Research in Computer Science and Software Engineering, vol. 5, no. 1, pp. 1–6, 2015.

J. D. Wu and B. F. Lin, “Speaker identification using discrete wavelet packet transform technique with irregular decomposition,†Expert Systems with Applications, vol. 36, no. 2 PART 2, pp. 3136–3143, 2009, doi: 10.1016/j.eswa.2008.01.038.

A. Shafik, S. M. Elhalafawy, S. M. Diab, B. M. Sallam, and F. E. Abd El-samie, “A wavelet based approach for speaker identification from degraded speech,†International Journal of Communication Networks and Information Security, vol. 1, no. 3, pp. 52–58, 2009.

R. Chakroun, L. B. Zouari, M. Frikha, and A. Ben Hamida, “A hybrid system based on GMM-SVM for speaker identification,†International Conference on Intelligent Systems Design and Applications, ISDA, pp. 654–658, 2016, doi: 10.1109/ISDA.2015.7489195.

D. Handaya, H. Fakhruroja, E. M. I. Hidayat, and C. Machbub, “Comparison of Indonesian speaker recognition using vector quantization and Hidden Markov Model for unclear pronunciation problem,†Proceedings of the 2016 6th International Conference on System Engineering and Technology, ICSET 2016, pp. 39–45, 2017, doi: 10.1109/FIT.2016.7857535.

W. C. Chen, C. T. Hsieh, and C. H. Hsu, “Robust speaker identification system based on two-stage vector quantization,†Tamkang Journal of Science and Engineering, vol. 11, no. 4, pp. 357–366, 2008.

A. H. Mansour, G. Zen, A. Salh, and K. A. Mohammed, “Voice Recognition using Dynamic Time Warping and Mel-Frequency Cepstral Coefficients Algorithms,†International Journal of Computer Applications, vol. 116, no. 2, pp. 975–8887, 2015, doi: 10.5120/20312-2362.

T. F. FURTUNA, “Dynamic Programming Algorithms in Speech Recognition,†Informatica Economica, vol. XII, no. March, pp. 94–98, 2008, [Online]. Available: http://econpapers.repec.org/RePEc:aes:infoec:v:xii:y:2008:i:2:p:94-98.

R. C. Rose, E. M. Hofstetter, and D. A. Reynolds, “Integrated Models of Signal and Background with Application to Sneaker Identification in Noise,†IEEE Transactions on Speech and Audio Processing, vol. 2, no. 2, pp. 245–257, 1994, doi: 10.1109/89.279273.

N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-end factor analysis for speaker verification,†IEEE Transactions on Audio, Speech and Language Processing, vol. 19, no. 4, pp. 788–798, 2011, doi: 10.1109/TASL.2010.2064307.

N. S. Ibrahim and D. A. Ramli, “I-vector Extraction for Speaker Recognition Based on Dimensionality Reduction,†Procedia Computer Science, vol. 126, pp. 1534–1540, 2018, doi: 10.1016/j.procs.2018.08.126.

H. S. Bae, H. J. Lee, and S. G. Lee, “Voice recognition based on adaptive MFCC and deep learning,†Proceedings of the 2016 IEEE 11th Conference on Industrial Electronics and Applications, ICIEA 2016, pp. 1542–1546, 2016, doi: 10.1109/ICIEA.2016.7603830.

T. Gulzar, A. Singh, and S. Sharma, “Comparative Analysis of LPCC, MFCC and BFCC for the Recognition of Hindi Words using Artificial Neural Networks,†International Journal of Computer Applications, vol. 101, no. 12, pp. 22–27, 2014, [Online]. Available: https://pdfs.semanticscholar.org/a9d5/3dce0ef368d9bb0e461ad73a4519319e79a6.pdf.

C. Li, X. Ma, B. Jiang, and X. Li, “Deep Speaker : an End-to-End Neural Speaker Embedding System,†arXiv, pp. 1–8, 2017.

G. R. Dhinesh, G. R. Jagadeesh, and T. Srikanthan, “A low-complexity speaker-and-word recognition application for resource-constrained devices,†Proceedings - 2011 International Symposium on Electronic System Design, ISED 2011, pp. 335–340, 2011, doi: 10.1109/ISED.2011.30.

U. Bhattacharjee, “A Comparative Study Of LPCC And MFCC Features For The Recognition Of Assamese Phonemes,†International Journal of Engineering Research & Technology (IJERT), vol. 2, no. 1, pp. 1–7, 2013.

B. Srinivas and P. Subhashini, “Text Independent Speaker Identification using SVM with MFCC,†Global Journal of Advanced Engineering Technologies, vol. 5, no. 2, pp. 255–266, 2016.

Downloads

Published

2022-01-25