Implementasi Speech Recognition Menggunakan Long Short-Term Memory untuk Software Presentasi

Satriya Adhitama, Donny Avianto

Abstract


Presentation is one of the methods for delivering thoughts, ideas, and concepts to an audience verbally. Presentation activities can be supported by presentation software that can be used to organize the sequence of material to be presented with visually appealing visuals. Operating presentation software requires technical assistance such as a remote, mouse, keyboard, and even a personal assistant, which can be distracting to the presenter as it limits their freedom in delivering the material. This distraction can be addressed through the implementation of speech recognition as a command to operate presentation software, making it easier for the presenter. A speech recognition system is developed using Long Short-Term Memory (LSTM), which can handle the issues of long-term dependency and vanishing gradient associated with Recurrent Neural Networks (RNN). There are 10 command words used to operate the presentation software. LSTM demonstrates superior performance when compared to alternative techniques like DNN, CNN, and SimpleRNN, achieving a training accuracy of 96.5%, a validation accuracy of 94.8%, and a testing accuracy of 94%. The LSTM method can be effectively used for sequential data to recognize real-time speech.

Keywords


Speech Recognition; Classification; LSTM; Speech Command; Presentation Software

Full Text:

PDF

References


I. Lisnawati dan Y. Ertinawati, “Literat Melalui Presentasi,†Jurnal Metaedukasi, vol. 1, no. 1, hlm. 1–12, 2019, doi: https://doi.org/10.37058/metaedukasi.v1i1.976.

J. Levis dan R. Suvorov, “Automatic Speech Recognition,†dalam The Encyclopedia of Applied Linguistics, Wiley, 2012. doi: 10.1002/9781405198431.wbeal0066.

W. Mustikarini, R. Hidayat, dan A. Bejo, “Real-Time Indonesian Language Speech Recognition with MFCC Algorithms and Python-Based SVM,†IJITEE (International Journal of Information Technology and Electrical Engineering), vol. 3, no. 2, hlm. 55, Okt 2019, doi: 10.22146/ijitee.49426.

S. Sen, A. Dutta, dan N. Dey, Audio Processing and Speech Recognition. Singapore: Springer Singapore, 2019. doi: 10.1007/978-981-13-6098-5.

J. L. K. E. Fendji, D. C. M. Tala, B. O. Yenke, dan M. Atemkeng, “Automatic Speech Recognition Using Limited Vocabulary: A Survey,†Applied Artificial Intelligence, vol. 36, no. 1, Des 2022, doi: 10.1080/08839514.2022.2095039.

Gustav Bagus Samanta, “Classification Analysis using CNN and LSTM on Wheezing Sounds,†International Journal On Information And Communication Technology (IJOICT), vol. 8, no. 1, hlm. 60–88, Agu 22M, doi: https://doi.org/10.21108/ijoict.v8i1.621.

A. W. Saputra, A. P. Wibawa, U. Pujianto, A. B. Putra Utama, dan A. Nafalski, “LSTM-based Multivariate Time-Series Analysis: A Case of Journal Visitors Forecasting,†ILKOM Jurnal Ilmiah, vol. 14, no. 1, hlm. 57–62, Apr 2022, doi: 10.33096/ilkom.v14i1.1106.57-62.

A. Shewalkar, D. Nyavanandi, dan S. A. Ludwig, “Performance Evaluation of Deep Neural Networks Applied to Speech Recognition: RNN, LSTM and GRU,†Journal of Artificial Intelligence and Soft Computing Research, vol. 9, no. 4, hlm. 235–245, Okt 2019, doi: 10.2478/jaiscr-2019-0006.

P. Tridarma dan S. N. Endah, “Pengenalan Ucapan Bahasa Indonesia Menggunakan MFCC dan Recurrent Neural Network,†JURNAL MASYARAKAT INFORMATIKA, vol. 11, no. 2, hlm. 36–44, Nov 2020, doi: 10.14710/jmasif.11.2.34874.

N. Aini Lailla Asri, R. Ibnu Adam, dan B. Arif Dermawan, “SPEECH RECOGNITION UNTUK KLASIFIKASI PENGUCAPAN NAMA HEWAN DALAM BAHASA SUNDA MENGGUNAKAN METODE LONG-SHORT TERM MEMORY,†JATI (Jurnal Mahasiswa Teknik Informatika), vol. 7, no. 2, hlm. 1242–1247, Sep 2023, doi: 10.36040/jati.v7i2.6744.

Y. Zhang, N. Suda, L. Lai, dan V. Chandra, “Hello Edge: Keyword Spotting on Microcontrollers,†Nov 2017, [Daring]. Tersedia pada: http://arxiv.org/abs/1711.07128

W. Ying, L. Zhang, dan H. Deng, “Sichuan dialect speech recognition with deep LSTM network,†Front Comput Sci, vol. 14, no. 2, hlm. 378–387, Apr 2020, doi: 10.1007/s11704-018-8030-z.

W. K. Sari, D. P. Rini, dan R. F. Malik, “Text Classification Using Long Short-Term Memory With GloVe Features,†Jurnal Ilmiah Teknik Elektro Komputer dan Informatika, vol. 5, no. 2, hlm. 85, Feb 2020, doi: 10.26555/jiteki.v5i2.15021.

J. Oruh, S. Viriri, dan A. Adegun, “Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition,†IEEE Access, vol. 10, hlm. 30069–30079, 2022, doi: 10.1109/ACCESS.2022.3159339.

L. Xiang, S. Lu, X. Wang, H. Liu, W. Pang, dan H. Yu, “Implementation of LSTM Accelerator for Speech Keywords Recognition,†dalam 2019 IEEE 4th International Conference on Integrated Circuits and Microsystems (ICICM), IEEE, Okt 2019, hlm. 195–198. doi: 10.1109/ICICM48536.2019.8977176.

P. Warden, “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition,†hlm. 1–11, Apr 2018, [Daring]. Tersedia pada: http://arxiv.org/abs/1804.03209

M. Lech, M. Stolar, R. Bolia, dan M. Skinner, “Amplitude-Frequency Analysis of Emotional Speech Using Transfer Learning and Classification of Spectrogram Images,†Advances in Science, Technology and Engineering Systems Journal, vol. 3, no. 4, hlm. 363–371, Agu 2018, doi: 10.25046/aj030437.

A. Sagheer dan M. Kotb, “Time series forecasting of petroleum production using deep LSTM recurrent networks,†Neurocomputing, vol. 323, hlm. 203–213, Jan 2019, doi: 10.1016/j.neucom.2018.09.082.

R. Sari, K. Kusrini, T. Hidayat, dan T. Orphanoudakis, “Improved LSTM Method of Predicting Cryptocurrency Price Using Short-Term Data,†IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 17, no. 1, hlm. 33, Feb 2023, doi: 10.22146/ijccs.80776.

I. Lezhenin, N. Bogach, dan E. Pyshkin, “Urban Sound Classification using Long Short-Term Memory Neural Network,†Sep 2019, hlm. 57–60. doi: 10.15439/2019F185.

A. Yadav, C. K. Jha, dan A. Sharan, “Optimizing LSTM for time series prediction in Indian stock market,†Procedia Comput Sci, vol. 167, hlm. 2091–2100, 2020, doi: 10.1016/j.procs.2020.03.257.

D. Singh dan B. Singh, “Investigating the impact of data normalization on classification performance,†Appl Soft Comput, vol. 97, hlm. 105524, Des 2020, doi: 10.1016/j.asoc.2019.105524.

B. Jabir dan N. Falih, “Dropout, a basic and effective regularization method for a deep learning model: a case study,†Indonesian Journal of Electrical Engineering and Computer Science, vol. 24, no. 2, hlm. 1009, Nov 2021, doi: 10.11591/ijeecs.v24.i2.pp1009-1016.

N. Bacanin, R. Stoean, M. Zivkovic, A. Petrovic, T. A. Rashid, dan T. Bezdan, “Performance of a Novel Chaotic Firefly Algorithm with Enhanced Exploration for Tackling Global Optimization Problems: Application for Dropout Regularization,†Mathematics, vol. 9, no. 21, hlm. 2705, Okt 2021, doi: 10.3390/math9212705.

Y. Ho dan S. Wookey, “The Real-World-Weight Cross-Entropy Loss Function: Modeling the Costs of Mislabeling,†IEEE Access, vol. 8, hlm. 4806–4813, 2020, doi: 10.1109/ACCESS.2019.2962617.

J. Terven, D. M. Cordova-Esparza, A. Ramirez-Pedraza, dan E. A. Chavez-Urbiola, “Loss Functions and Metrics in Deep Learning,†Jul 2023, doi: https://doi.org/10.48550/arXiv.2307.02694.




DOI: https://doi.org/10.30865/json.v5i2.6950

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Satriya Adhitama, Donny Avianto

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Jurnal Sistem Komputer dan Informatika (JSON)
Dikelola oleh Universitas Budi Darma
Sekretariat : Jln. Sisingamangaraja No. 338 Telp 061-7875998
email : jurnal.json@gmail.com


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.