Hate Speech Detection in Indonesia Twitter Comments Using Convolutional Neural Network (CNN) and FastText Word Embedding

Fadhilah Nadia Puteri; Yuliant Sibaroni; Fitriyani F.

doi:10.30865/mib.v7i3.6401

Authors

Fadhilah Nadia Puteri Telkom University, Bandung
Yuliant Sibaroni Telkom University, Bandung
Fitriyani F. Telkom University, Bandung

DOI:

https://doi.org/10.30865/mib.v7i3.6401

Keywords:

Hate Speech, Twitter, Deep Learning, Convolutional Neural Network, FastText

Abstract

Hate speech is a problem that is often present in Indonesia, including on social media platforms such as Twitter. Refers to any form of communication, whether oral, written, or symbolic, that may offend, threaten or insult an individual or group based on attributes such as religion, race, ethnicity, sexual orientation, or other characteristics. The existence of freedom of expression and communication on social media triggers the spread of hate speech quickly and widely. To avoid this, a system is needed that can detect hate speech on social media. Deep learning is potentially better at recognizing and analyzing language patterns that reflect hate speech in text. In the previous study, the accuracy obtained was 73.2% using the Convolutional Neural Network method. This study proposed a hate speech detection system using Convolutional Neural Network model and FastText word embedding. The performance of Convolutional Neural Network classification model and FastText as word embedding provide excellent performance results in detecting hate speech, by involving the K-Fold Cross Validation process to the appropriate dropout value is able to achieve an accuracy value of 80%. The resulting accuracy value can be a benchmark that the model that has been built is able to avoid the spread of hate speech on social media.

Author Biography

Fadhilah Nadia Puteri, Telkom University, Bandung

Prodi Informatika

References

A. Sepima, G. T.P. Siregar, and S. Amry Siregar, â€œPenegakan Hukum Ujaran Kebencian di Republik Indonesia,â€ 2021.

K. Antariksa, Y. W. Sigit Purnomo, and D. Ernawati, â€œKlasifikasi Ujaran Kebencian pada Cuitan dalam Bahasa Indonesia,â€ 2019.

J. S. Malik, G. Pang, and A. van den Hengel, â€œDeep Learning for Hate Speech Detection: A Comparative Study,â€ Feb. 2022, [Online]. Available: http://arxiv.org/abs/2202.09517

Y. Zhou, Y. Yang, H. Liu, X. Liu, and N. Savage, â€œDeep Learning Based Fusion Approach for Hate Speech Detection,â€ IEEE Access, vol. 8, pp. 128923â€“128929, 2020, doi: 10.1109/ACCESS.2020.3009244.

N. Indah Pratiwi, I. Budi, and I. Alfina, Hate Speech Detection on Indonesian Instagram Comments using FastText Approach. IEEE, 2018.

T. Van Huynh, V. D. Nguyen, K. Van Nguyen, N. L.-T. Nguyen, and A. G.-T. Nguyen, â€œHate Speech Detection on Vietnamese Social Media Text using the Bi-GRU-LSTM-CNN Model,â€ Nov. 2019, [Online]. Available: http://arxiv.org/abs/1911.03644

M. Ridwan and A. Muzakir, â€œModel Klasifikasi Ujaran Kebencian pada Data Twitter dengan Menggunakan CNN-LSTM HATE SPEECH CLASSIFICATION MODEL ON TWITTER DATA USING CNN-LSTM,â€ TEKNOMATIKA, vol. 12, no. 02, pp. 1â€“5, 2022.

A. Velankar, H. Patil, A. Gore, S. Salunke, and R. Joshi, â€œHate and Offensive Speech Detection in Hindi and Marathi,â€ Oct. 2021, [Online]. Available: http://arxiv.org/abs/2110.12200

R. Joshi, R. Karnavat, K. Jirapure, and R. Joshi, â€œEvaluation of Deep Learning Models for Hostility Detection in Hindi Text,â€ Jan. 2021, doi: 10.1109/I2CT51068.2021.9418073.

M. O. Ibrohim and I. Budi, â€œMulti-label Hate Speech and Abusive Language Detection in Indonesian Twitter,â€ 2019. [Online]. Available: https://www.komnasham.go.id/index.php/

M. A. Rosid, A. S. Fitrani, I. R. I. Astutik, N. I. Mulloh, and H. A. Gozali, â€œImproving Text Preprocessing for Student Complaint Document Classification Using Sastrawi,â€ in IOP Conference Series: Materials Science and Engineering, Institute of Physics Publishing, Jul. 2020. doi: 10.1088/1757-899X/874/1/012017.

N. Adani Setyadi, A. Setyadi, M. Nasrun, and C. Setianingsih, Text Analysis for Hate Speech Detection Using Backpropagation Neural Network. 2018.

A. Nurdin, B. Anggo, S. Aji, A. Bustamin, and Z. Abidin, â€œPERBANDINGAN KINERJA WORD EMBEDDING WORD2VEC, GLOVE, DAN FASTTEXT PADA KLASIFIKASI TEKS,â€ Jurnal TEKNOKOMPAK, vol. 14, no. 2, p. 74, 2020.

L. Alzubaidi et al., â€œReview of deep learning: concepts, CNN architectures, challenges, applications, future directions,â€ J Big Data, vol. 8, no. 1, Dec. 2021, doi: 10.1186/s40537-021-00444-8.

I. Ali Kandhro, S. Zafar Jumani, F. Ali, Z. Uddin Shaikh, M. Arshad Arain, and A. Ahmed Shaikh, â€œPerformance Analysis of Hyperparameters on a Sentiment Analysis Model,â€ 2020. [Online]. Available: www.etasr.com

J. Elektronik Ilmu Komputer Udayana et al., â€œAnalisis Sentimen Ulasan E-Commerce Pakaian Berdasarkan Kategori dengan Algoritma Convolutional Neural Network,â€ 2022.

N. Nedjah, I. Santos, and L. de Macedo Mourelle, â€œSentiment analysis using convolutional neural network via word embeddings,â€ Evol Intell, vol. 15, no. 4, pp. 2295â€“2319, Dec. 2022, doi: 10.1007/s12065-019-00227-4.

S. Mestry, V. Bisht, H. Singh, K. Tiwari, and R. Chauhan, â€œAutomation in Social Networking Comments With the Help of Robust fastText and CNN,â€ 2019. doi: 10.1109/ICIICT1.2019.8741503.

M. AydoÄŸan and A. Karci, â€œImproving the accuracy using pre-trained word embeddings on deep neural networks for Turkish text classification,â€ Physica A: Statistical Mechanics and its Applications, vol. 541, Mar. 2020, doi: 10.1016/j.physa.2019.123288.

G. Battineni, G. G. Sagaro, C. Nalini, F. Amenta, and S. K. Tayebati, â€œComparative machine-learning approach: A follow-up study on type 2 diabetes predictions by cross-validation methods,â€ Machines, vol. 7, no. 4, 2019, doi: 10.3390/machines7040074.

I. K. Nti, O. Nyarko-Boateng, and J. Aning, â€œPerformance of Machine Learning Algorithms with Different K Values in K-fold CrossValidation,â€ International Journal of Information Technology and Computer Science, vol. 13, no. 6, pp. 61â€“71, Dec. 2021, doi: 10.5815/ijitcs.2021.06.05.