Hate Speech Detection in Indonesia Twitter Comments Using Convolutional Neural Network (CNN) and FastText Word Embedding

 (*)Fadhilah Nadia Puteri Mail (Telkom University, Bandung, Indonesia)
 Yuliant Sibaroni (Telkom University, Bandung, Indonesia)
 Fitriyani F. (Telkom University, Bandung, Indonesia)

(*) Corresponding Author

Submitted: June 19, 2023; Published: July 23, 2023

Abstract

Hate speech is a problem that is often present in Indonesia, including on social media platforms such as Twitter. Refers to any form of communication, whether oral, written, or symbolic, that may offend, threaten or insult an individual or group based on attributes such as religion, race, ethnicity, sexual orientation, or other characteristics. The existence of freedom of expression and communication on social media triggers the spread of hate speech quickly and widely. To avoid this, a system is needed that can detect hate speech on social media. Deep learning is potentially better at recognizing and analyzing language patterns that reflect hate speech in text. In the previous study, the accuracy obtained was 73.2% using the Convolutional Neural Network method. This study proposed a hate speech detection system using Convolutional Neural Network model and FastText word embedding. The performance of Convolutional Neural Network classification model and FastText as word embedding provide excellent performance results in detecting hate speech, by involving the K-Fold Cross Validation process to the appropriate dropout value is able to achieve an accuracy value of 80%. The resulting accuracy value can be a benchmark that the model that has been built is able to avoid the spread of hate speech on social media.

Keywords


Hate Speech; Twitter; Deep Learning; Convolutional Neural Network; FastText

Full Text:

PDF


Article Metrics

Abstract view : 581 times
PDF - 326 times

References

A. Sepima, G. T.P. Siregar, and S. Amry Siregar, “Penegakan Hukum Ujaran Kebencian di Republik Indonesia,” 2021.

K. Antariksa, Y. W. Sigit Purnomo, and D. Ernawati, “Klasifikasi Ujaran Kebencian pada Cuitan dalam Bahasa Indonesia,” 2019.

J. S. Malik, G. Pang, and A. van den Hengel, “Deep Learning for Hate Speech Detection: A Comparative Study,” Feb. 2022, [Online]. Available: http://arxiv.org/abs/2202.09517

Y. Zhou, Y. Yang, H. Liu, X. Liu, and N. Savage, “Deep Learning Based Fusion Approach for Hate Speech Detection,” IEEE Access, vol. 8, pp. 128923–128929, 2020, doi: 10.1109/ACCESS.2020.3009244.

N. Indah Pratiwi, I. Budi, and I. Alfina, Hate Speech Detection on Indonesian Instagram Comments using FastText Approach. IEEE, 2018.

T. Van Huynh, V. D. Nguyen, K. Van Nguyen, N. L.-T. Nguyen, and A. G.-T. Nguyen, “Hate Speech Detection on Vietnamese Social Media Text using the Bi-GRU-LSTM-CNN Model,” Nov. 2019, [Online]. Available: http://arxiv.org/abs/1911.03644

M. Ridwan and A. Muzakir, “Model Klasifikasi Ujaran Kebencian pada Data Twitter dengan Menggunakan CNN-LSTM HATE SPEECH CLASSIFICATION MODEL ON TWITTER DATA USING CNN-LSTM,” TEKNOMATIKA, vol. 12, no. 02, pp. 1–5, 2022.

A. Velankar, H. Patil, A. Gore, S. Salunke, and R. Joshi, “Hate and Offensive Speech Detection in Hindi and Marathi,” Oct. 2021, [Online]. Available: http://arxiv.org/abs/2110.12200

R. Joshi, R. Karnavat, K. Jirapure, and R. Joshi, “Evaluation of Deep Learning Models for Hostility Detection in Hindi Text,” Jan. 2021, doi: 10.1109/I2CT51068.2021.9418073.

M. O. Ibrohim and I. Budi, “Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter,” 2019. [Online]. Available: https://www.komnasham.go.id/index.php/

M. A. Rosid, A. S. Fitrani, I. R. I. Astutik, N. I. Mulloh, and H. A. Gozali, “Improving Text Preprocessing for Student Complaint Document Classification Using Sastrawi,” in IOP Conference Series: Materials Science and Engineering, Institute of Physics Publishing, Jul. 2020. doi: 10.1088/1757-899X/874/1/012017.

N. Adani Setyadi, A. Setyadi, M. Nasrun, and C. Setianingsih, Text Analysis for Hate Speech Detection Using Backpropagation Neural Network. 2018.

A. Nurdin, B. Anggo, S. Aji, A. Bustamin, and Z. Abidin, “PERBANDINGAN KINERJA WORD EMBEDDING WORD2VEC, GLOVE, DAN FASTTEXT PADA KLASIFIKASI TEKS,” Jurnal TEKNOKOMPAK, vol. 14, no. 2, p. 74, 2020.

L. Alzubaidi et al., “Review of deep learning: concepts, CNN architectures, challenges, applications, future directions,” J Big Data, vol. 8, no. 1, Dec. 2021, doi: 10.1186/s40537-021-00444-8.

I. Ali Kandhro, S. Zafar Jumani, F. Ali, Z. Uddin Shaikh, M. Arshad Arain, and A. Ahmed Shaikh, “Performance Analysis of Hyperparameters on a Sentiment Analysis Model,” 2020. [Online]. Available: www.etasr.com

J. Elektronik Ilmu Komputer Udayana et al., “Analisis Sentimen Ulasan E-Commerce Pakaian Berdasarkan Kategori dengan Algoritma Convolutional Neural Network,” 2022.

N. Nedjah, I. Santos, and L. de Macedo Mourelle, “Sentiment analysis using convolutional neural network via word embeddings,” Evol Intell, vol. 15, no. 4, pp. 2295–2319, Dec. 2022, doi: 10.1007/s12065-019-00227-4.

S. Mestry, V. Bisht, H. Singh, K. Tiwari, and R. Chauhan, “Automation in Social Networking Comments With the Help of Robust fastText and CNN,” 2019. doi: 10.1109/ICIICT1.2019.8741503.

M. Aydoğan and A. Karci, “Improving the accuracy using pre-trained word embeddings on deep neural networks for Turkish text classification,” Physica A: Statistical Mechanics and its Applications, vol. 541, Mar. 2020, doi: 10.1016/j.physa.2019.123288.

G. Battineni, G. G. Sagaro, C. Nalini, F. Amenta, and S. K. Tayebati, “Comparative machine-learning approach: A follow-up study on type 2 diabetes predictions by cross-validation methods,” Machines, vol. 7, no. 4, 2019, doi: 10.3390/machines7040074.

I. K. Nti, O. Nyarko-Boateng, and J. Aning, “Performance of Machine Learning Algorithms with Different K Values in K-fold CrossValidation,” International Journal of Information Technology and Computer Science, vol. 13, no. 6, pp. 61–71, Dec. 2021, doi: 10.5815/ijitcs.2021.06.05.

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 JURNAL MEDIA INFORMATIKA BUDIDARMA

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.



JURNAL MEDIA INFORMATIKA BUDIDARMA
STMIK Budi Darma
Secretariat: Sisingamangaraja No. 338 Telp 061-7875998
Email: mib.stmikbd@gmail.com

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.