Klasifikasi Emosi Pada Data Text Bahasa Indonesia Menggunakan Algoritma BERT, RoBERTa, dan Distil-BERT

 (*)Faishal Basbeth Mail (Universitas Islam Indonesia, Yogyakarta, Indonesia)
 Dhomas Hatta Fudholi (Universitas Islam Indonesia, Yogyakarta, Indonesia)

(*) Corresponding Author

Submitted: February 8, 2024; Published: April 30, 2024

Abstract

Previous studies in the context of sentiment analysis have been conducted in various types of text domains, including product reviews, movies, news, and opinions. Sentiment analysis focuses on recognizing the valence of positive or negative orientations. In sentiment analysis, something that is less explored but is needed in analysis is the recognition of types of emotions / classification of emotions. This research has contributed to knowledge of the applicationof distilBERT to Indonesian Language data which is still relatively new and the success of fine-tuning and hyperparameter-tuning that has been applied to distilBERT with various comparison methods. Emotion classification can help provide the right decisions in a company, government, and marketing strategy. Emotion classification is part of sentiment analysis which has more detailed and in-depth emotion labels. From these various urgencies, this research builds and fine-tuning distilBERT model to help classify emotions from an Indonesian sentence. Emotion classification in Indonesian language data using the distilBERT model is still relatively new. This research consists of 4 steps: data collection, data preprocessing, hyperparameter-tuning and fine-tuning, and comparison of results from various models that have been applied in this research: BERT, RoBERTa, and distilBERT. DistilBERT got the highest training accuracy = 94.76, the highest testing accuracy was obtained by distilBERT-frezee = 86.67, and the highest f1-score was obtained by distilBERT Modif = 87. In distilBERT-frezee the main cause of the model getting significant results is the dropout hyperparameter which reduces the f1-score value by 3 if it is not used, and the second cause is the freeze-layer which reduces the f1-score value by 1 if it is not used.

Keywords


Sentiment Analysis; Emotion Classification; DistilBERT; Fine-Tuning; Indonesian Language

Full Text:

PDF


Article Metrics

Abstract view : 334 times
PDF - 151 times

References

F. A. Acheampong, H. Nunoo-Mensah, and W. Chen, Transformer Models for Text-based Emotion Detection: A Review of BERT-based Approaches, Springer, 2021, doi: https://doi.org/10.1007/s10462-021-09958-2.

C. H. Lin and U. Nuha, Sentiment analysis of Indonesian datasets based on a hybrid deep-learning strategy, J Big Data, vol. 10, no. 1, Dec. 2023, doi: 10.1186/s40537-023-00782-9.

B. Wilie et al., IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding, ArXiv, Sep. 2020, doi: https://doi.org/10.48550/arXiv.2009.05387.

S. M. Isa, G. Nico, and M. Permana, INDOBERT FOR INDONESIAN FAKE NEWS DETECTION, ICIC Express Letters, vol. 16, no. 3, pp. 289297, Mar. 2022, doi: 10.24507/icicel.16.03.289.

Y. K. Wiciaputra, J. C. Young, and A. Rusli, Bilingual text classification in english and indonesian via transfer learning using XLM-RoBERTa, International Journal of Advances in Soft Computing and its Applications, vol. 13, no. 3, pp. 7287, 2021, doi: 10.15849/ijasca.211128.06.

S. Haque, Z. Eberhart, A. Bansal, and C. McMillan, Deep Learning Based Text Classification: A Comprehensive Review, IEEE International Conference on Program Comprehension, vol. 2022-March, pp. 3647, 2022, doi: https://doi.org/10.48550/arXiv.2004.03705.

P. Zhou, Z. Qi, S. Zheng, J. Xu, H. Bao, and B. Xu, Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling, ArXiv, Nov. 2016, doi: https://doi.org/10.48550/arXiv.1611.06639.

G. Liu and J. Guo, Bidirectional LSTM with attention mechanism and convolutional layer for text classification, Neurocomputing, vol. 337, pp. 325338, Apr. 2019, doi: 10.1016/j.neucom.2019.01.078.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, ArXiv, Oct. 2018, doi: http://arxiv.org/abs/1810.04805.

C. Sun, X. Qiu, Y. Xu, and X. Huang, How to Fine-Tune BERT for Text Classification?, Springer, May 2019, doi: 10.1007/978-3-030-32381-3_16.

S. Gonzlez-Carvajal and E. C. Garrido-Merchn, Comparing BERT against traditional machine learning text classification, ArXiv, May 2020, doi: https://doi.org/10.48550/arXiv.2005.13012.

Y. Liu et al., RoBERTa: A Robustly Optimized BERT Pretraining Approach, ArXiv, Jul. 2019, doi: https://doi.org/10.48550/arXiv.1907.11692.

A. Murarka, B. Radhakrishnan, and S. Ravichandran, Detection and Classification of mental illnesses on social media using RoBERTa, ArXiv, Nov. 2020, doi: https://doi.org/10.48550/arXiv.2011.11226.

Y. Guo, X. Dong, M. A. Al-Garadi, A. Sarker, C. Paris, and D. Moll-Aliod, Benchmarking of Transformer-Based Pre-Trained Models on Social Media Text Classification Datasets, ACL Anthology, 2020, doi: https://aclanthology.org/2020.alta-1.10.

Z. Lan et al., ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS, ICLR, 2020, doi: https://doi.org/10.48550/arXiv.1909.11942.

R. Qasim, W. H. Bangyal, M. A. Alqarni, and A. Ali Almazroi, A Fine-Tuned BERT-Based Transfer Learning Approach for Text Classification, J Healthc Eng, vol. 2022, 2022, doi: 10.1155/2022/3498123.

V. Sanh, L. Debut, J. Chaumond, and T. Wolf, DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, ArXiv, Oct. 2020, doi: https://doi.org/10.48550/arXiv.1910.01108.

Y. Arslan et al., A Comparison of Pre-Trained Language Models for Multi-Class Text Classification in the Financial Domain, ACM, pp. 260268, Apr. 2021, doi: 10.1145/3442442.3451375.

F. Sriyono and A. Nasiri, Detecting Hate Speech In Twitter Using Long Short-Term Memory And Nave Bayes Method, Jurnal Ilmiah Indonesia, vol. 7, no. 2, 2022, doi: http://dx.doi.org/10.36418/Syntax- Literate.v7i2.6313.

R. Anggrainingsih, G. M. Hassan, and A. Datta, Evaluating BERT-based Pre-training Language Models for Detecting Misinformation A R T I C L E I N F O, 2022.

B. Wei, J. Li, A. Gupta, H. Umair, A. Vovor, and N. Durzynski, Offensive Language and Hate Speech Detection with Deep Learning and Transfer Learning, Aug. 2021, [Online]. Available: http://arxiv.org/abs/2108.03305

M. Dong, Emotion Classification on Indonesian Twitter Dataset, IEEE, 2018, doi: 10.1109/IALP.2018.8629262.

A. F. Hidayatullah and M. R. MaArif, Pre-processing Tasks in Indonesian Twitter Messages, in Journal of Physics: Conference Series, Institute of Physics Publishing, Mar. 2017. doi: 10.1088/1742-6596/801/1/012072.

Z. Shaheen, G. Wohlgenannt, and E. Filtz, Large Scale Legal Text Classification Using Transformer Models, ArXiv, Oct. 2020, doi: https://doi.org/10.48550/arXiv.2010.12871.

A. Vaswani et al., Attention Is All You Need, Jun. 2017, [Online]. Available: http://arxiv.org/abs/1706.03762

A. F. T. Martins, A. M. Com, and R. F. Astudillo, From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification, PMLR, vol. 48, 2016.

F. He, T. Liu, and D. Tao, Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence, 2019.

W. Hastomo, A. S. Bayangkari Karno, N. Kalbuana, A. Meiriki, and Sutarno, Characteristic Parameters of Epoch Deep Learning to Predict Covid-19 Data in Indonesia, in Journal of Physics: Conference Series, IOP Publishing Ltd, Jun. 2021. doi: 10.1088/1742-6596/1933/1/012050.

J. Xu, Y. Zhang, and D. Miao, Three-way confusion matrix for classification: A measure driven view, Inf Sci (N Y), vol. 507, pp. 772794, Jan. 2020, doi: 10.1016/j.ins.2019.06.064.

F. Rahmad, Y. Suryanto, and K. Ramli, Performance Comparison of Anti-Spam Technology Using Confusion Matrix Classification, in IOP Conference Series: Materials Science and Engineering, IOP Publishing Ltd, Aug. 2020. doi: 10.1088/1757-899X/879/1/012076.

I. Markoulidakis, I. Rallis, I. Georgoulas, G. Kopsiaftis, A. Doulamis, and N. Doulamis, Multiclass Confusion Matrix Reduction Method and Its Application on Net Promoter Score Classification Problem, Technologies (Basel), vol. 9, no. 4, p. 81, Nov. 2021, doi: 10.3390/technologies9040081.

Hidayaturrahman, E. Dave, D. Suhartono, and A. M. Arymurthy, Enhancing argumentation component classification using contextual language model, J Big Data, vol. 8, no. 1, Dec. 2021, doi: 10.1186/s40537-021-00490-2.

K. Mutisari Hana, S. Al Faraby, and A. Bramantoro, Multi-label Classification of Indonesian Hate Speech on Twitter Using Support Vector Machines, 2020. [Online]. Available: https://github.com/okkyibrohim/id-multi-label-hate-speech-and-

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Klasifikasi Emosi Pada Data Text Bahasa Indonesia Menggunakan Algoritma BERT, RoBERTa, dan Distil-BERT

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 JURNAL MEDIA INFORMATIKA BUDIDARMA

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.



JURNAL MEDIA INFORMATIKA BUDIDARMA
STMIK Budi Darma
Secretariat: Sisingamangaraja No. 338 Telp 061-7875998
Email: mib.stmikbd@gmail.com

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.