Hate Speech Detection in Indonesia Twitter Comments Using Convolutional Neural Network (CNN) and FastText Word Embedding

Authors

  • Fadhilah Nadia Puteri Telkom University, Bandung
  • Yuliant Sibaroni Telkom University, Bandung
  • Fitriyani F. Telkom University, Bandung

DOI:

https://doi.org/10.30865/mib.v7i3.6401

Keywords:

Hate Speech, Twitter, Deep Learning, Convolutional Neural Network, FastText

Abstract

Hate speech is a problem that is often present in Indonesia, including on social media platforms such as Twitter. Refers to any form of communication, whether oral, written, or symbolic, that may offend, threaten or insult an individual or group based on attributes such as religion, race, ethnicity, sexual orientation, or other characteristics. The existence of freedom of expression and communication on social media triggers the spread of hate speech quickly and widely. To avoid this, a system is needed that can detect hate speech on social media. Deep learning is potentially better at recognizing and analyzing language patterns that reflect hate speech in text. In the previous study, the accuracy obtained was 73.2% using the Convolutional Neural Network method. This study proposed a hate speech detection system using Convolutional Neural Network model and FastText word embedding. The performance of Convolutional Neural Network classification model and FastText as word embedding provide excellent performance results in detecting hate speech, by involving the K-Fold Cross Validation process to the appropriate dropout value is able to achieve an accuracy value of 80%. The resulting accuracy value can be a benchmark that the model that has been built is able to avoid the spread of hate speech on social media.

Author Biography

Fadhilah Nadia Puteri, Telkom University, Bandung

Prodi Informatika

References

A. Sepima, G. T.P. Siregar, and S. Amry Siregar, “Penegakan Hukum Ujaran Kebencian di Republik Indonesia,†2021.

K. Antariksa, Y. W. Sigit Purnomo, and D. Ernawati, “Klasifikasi Ujaran Kebencian pada Cuitan dalam Bahasa Indonesia,†2019.

J. S. Malik, G. Pang, and A. van den Hengel, “Deep Learning for Hate Speech Detection: A Comparative Study,†Feb. 2022, [Online]. Available: http://arxiv.org/abs/2202.09517

Y. Zhou, Y. Yang, H. Liu, X. Liu, and N. Savage, “Deep Learning Based Fusion Approach for Hate Speech Detection,†IEEE Access, vol. 8, pp. 128923–128929, 2020, doi: 10.1109/ACCESS.2020.3009244.

N. Indah Pratiwi, I. Budi, and I. Alfina, Hate Speech Detection on Indonesian Instagram Comments using FastText Approach. IEEE, 2018.

T. Van Huynh, V. D. Nguyen, K. Van Nguyen, N. L.-T. Nguyen, and A. G.-T. Nguyen, “Hate Speech Detection on Vietnamese Social Media Text using the Bi-GRU-LSTM-CNN Model,†Nov. 2019, [Online]. Available: http://arxiv.org/abs/1911.03644

M. Ridwan and A. Muzakir, “Model Klasifikasi Ujaran Kebencian pada Data Twitter dengan Menggunakan CNN-LSTM HATE SPEECH CLASSIFICATION MODEL ON TWITTER DATA USING CNN-LSTM,†TEKNOMATIKA, vol. 12, no. 02, pp. 1–5, 2022.

A. Velankar, H. Patil, A. Gore, S. Salunke, and R. Joshi, “Hate and Offensive Speech Detection in Hindi and Marathi,†Oct. 2021, [Online]. Available: http://arxiv.org/abs/2110.12200

R. Joshi, R. Karnavat, K. Jirapure, and R. Joshi, “Evaluation of Deep Learning Models for Hostility Detection in Hindi Text,†Jan. 2021, doi: 10.1109/I2CT51068.2021.9418073.

M. O. Ibrohim and I. Budi, “Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter,†2019. [Online]. Available: https://www.komnasham.go.id/index.php/

M. A. Rosid, A. S. Fitrani, I. R. I. Astutik, N. I. Mulloh, and H. A. Gozali, “Improving Text Preprocessing for Student Complaint Document Classification Using Sastrawi,†in IOP Conference Series: Materials Science and Engineering, Institute of Physics Publishing, Jul. 2020. doi: 10.1088/1757-899X/874/1/012017.

N. Adani Setyadi, A. Setyadi, M. Nasrun, and C. Setianingsih, Text Analysis for Hate Speech Detection Using Backpropagation Neural Network. 2018.

A. Nurdin, B. Anggo, S. Aji, A. Bustamin, and Z. Abidin, “PERBANDINGAN KINERJA WORD EMBEDDING WORD2VEC, GLOVE, DAN FASTTEXT PADA KLASIFIKASI TEKS,†Jurnal TEKNOKOMPAK, vol. 14, no. 2, p. 74, 2020.

L. Alzubaidi et al., “Review of deep learning: concepts, CNN architectures, challenges, applications, future directions,†J Big Data, vol. 8, no. 1, Dec. 2021, doi: 10.1186/s40537-021-00444-8.

I. Ali Kandhro, S. Zafar Jumani, F. Ali, Z. Uddin Shaikh, M. Arshad Arain, and A. Ahmed Shaikh, “Performance Analysis of Hyperparameters on a Sentiment Analysis Model,†2020. [Online]. Available: www.etasr.com

J. Elektronik Ilmu Komputer Udayana et al., “Analisis Sentimen Ulasan E-Commerce Pakaian Berdasarkan Kategori dengan Algoritma Convolutional Neural Network,†2022.

N. Nedjah, I. Santos, and L. de Macedo Mourelle, “Sentiment analysis using convolutional neural network via word embeddings,†Evol Intell, vol. 15, no. 4, pp. 2295–2319, Dec. 2022, doi: 10.1007/s12065-019-00227-4.

S. Mestry, V. Bisht, H. Singh, K. Tiwari, and R. Chauhan, “Automation in Social Networking Comments With the Help of Robust fastText and CNN,†2019. doi: 10.1109/ICIICT1.2019.8741503.

M. Aydoğan and A. Karci, “Improving the accuracy using pre-trained word embeddings on deep neural networks for Turkish text classification,†Physica A: Statistical Mechanics and its Applications, vol. 541, Mar. 2020, doi: 10.1016/j.physa.2019.123288.

G. Battineni, G. G. Sagaro, C. Nalini, F. Amenta, and S. K. Tayebati, “Comparative machine-learning approach: A follow-up study on type 2 diabetes predictions by cross-validation methods,†Machines, vol. 7, no. 4, 2019, doi: 10.3390/machines7040074.

I. K. Nti, O. Nyarko-Boateng, and J. Aning, “Performance of Machine Learning Algorithms with Different K Values in K-fold CrossValidation,†International Journal of Information Technology and Computer Science, vol. 13, no. 6, pp. 61–71, Dec. 2021, doi: 10.5815/ijitcs.2021.06.05.

Downloads

Published

2023-07-23

Issue

Section

Articles