Improved Text Classification for Indonesian Hate Speech Detection: FastText-LSTM Model with Easy Data Augmentation

Hilman Singgih Wicaksana; Khairul Huda; Gregorius Airlangga

doi:10.30865/json.v7i3.9637

Authors

Hilman Singgih Wicaksana Universitas Karya Husada https://orcid.org/0000-0001-9486-9601
Khairul Huda Universitas Karya Husada Semarang
Gregorius Airlangga Universitas Katolik Indonesia Atma Jaya

DOI:

https://doi.org/10.30865/json.v7i3.9637

Keywords:

Bayesian Optimization, Easy Data Augmentation, FastText, Hate Speech Detection, Long Short-Term Memory

Abstract

The swift expansion of social media in Indonesia has led to a significant rise in hate speech, highlighting the urgent need for effective automated detection techniques. This research evaluates the performance of the proposed FastText-Long Short-Term Memory with Easy Data Augmentation (FastText-LSTM-WE) compared with the baseline model, FastText-Convolutional Neural Network with Easy Data Augmentation (FastText-CNN-WE). To further investigate the impact of data augmentation, the effectiveness of both FastText-Long Short-Term Memory without Easy Data Augmentation (FastText-LSTM-WO) and FastText-Convolutional Neural Network without Easy Data Augmentation (FastText-CNN-WO) was also assessed. Bayesian Optimization was employed to identify the best hyperparameter configurations for each model. The experiments were carried out on a dataset comprising 14,306 samples while maintaining consistent experimental conditions. Model performance was measured using precision, recall, F1-score, and accuracy derived from the confusion matrix. The results indicate that FastText-LSTM-WE achieved the highest performance, with precision, recall, F1-score, and accuracy of 84.02%, 83.16%, 83.59%, and 81.37%, respectively. These findings demonstrate that the proposed model provides a robust and reliable solution for detecting hate speech within the Indonesian context, thereby improving automated content moderation systems in practical applications.

References

A. Nurdin, A. N. Paryati, S. K. Rizqi, I. H. Hermawan, and T. Q. Handayani, “The Role of Social Media in Political Education and Election Socialization Among Generation Z,” The Journal of Academic Science, no. 2, pp. 566–577, 2025, doi: https://doi.org/10.59613/2w7p1883.

A. Dreißigacker, P. Müller, A. Isenhardt, and J. Schemmel, “Online hate speech victimization: consequences for victims’ feelings of insecurity,” Crime Sci., vol. 13, no. 1, Dec. 2024, doi: 10.1186/s40163-024-00204-y.

H. Berchenko, P. Domingos, D. Shahiqi, Z. Fetahu, and R. Fetahu, “The Criminal Confrontation for the Crimes of Discrimination and Hate Speech: A Comparative Study,” Access to Justice in Eastern Europe, vol. 7, no. 2, pp. 138–162, 2024, doi: https://doi.org/10.33327/AJEE-18-7.2-000210.

P. Ray and A. Chakrabarti, “A Mixed approach of Deep Learning method and Rule-Based method to improve Aspect Level Sentiment Analysis,” Applied Computing and Informatics, vol. 18, no. 1–2, pp. 163–178, Jan. 2022, doi: 10.1016/j.aci.2019.02.002.

M. Vergani et al., “Mapping the scientific knowledge and approaches to defining and measuring hate crime, hate speech, and hate incidents: A systematic review,” Campbell Systematic Reviews, vol. 20, no. 2, pp. 1–54, Jun. 2024, doi: 10.1002/cl2.1397.

J. Zapata and O. Deroy, “Ordinary citizens are more severe towards verbal than nonverbal hate-motivated incidents with identical consequences,” Sci. Rep., vol. 13, no. 1, Dec. 2023, doi: 10.1038/s41598-023-33892-8.

J. M. Perez et al., “Assessing the Impact of Contextual Information in Hate Speech Detection,” IEEE Access, vol. 11, pp. 30575–30590, 2023, doi: 10.1109/ACCESS.2023.3258973.

P. Poschmann, J. Goldenstein, S. Büchel, and U. Hahn, “A Vector Space Approach for Measuring Relationality and Multidimensionality of Meaning in Large Text Collections,” Organ. Res. Methods, vol. 27, no. 4, pp. 650–680, Oct. 2024, doi: 10.1177/10944281231213068.

O. Karakaya and Z. H. Kilimci, “An efficient consolidation of word embedding and deep learning techniques for classifying anticancer peptides: FastText+BiLSTM,” PeerJ Comput. Sci., vol. 10, 2024, doi: 10.7717/peerj-cs.1831.

E. Aurora Az Zahra, Y. Sibaroni, and S. Suryani Prasetyowati, “Classification of Multi-Label of Hate Speech on Twitter Indonesia using LSTM and BiLSTM Method,” JINAV: Journal of Information and Visualization, vol. 4, no. 2, pp. 170–178, Jul. 2023, doi: 10.35877/454ri.jinav1864.

R. Angger Saputra and Y. Sibaroni, “Multilabel Hate Speech Classification in Indonesian Political Discourse on X using Combined Deep Learning Models with Considering Sentence Length,” Jurnal Ilmu Komputer dan Informasi, vol. 18, no. 1, pp. 113–125, Feb. 2025, doi: 10.21609/jiki.v18i1.1440.

A. Muhamad Faza, Y. Sibaroni, and S. S. Prasetyowati, “A Comparative Study on Handling Imbalanced Data in Indonesian Hate Speech Detection Using FastText and BiLSTM,” Intl. Journal on ICT, vol. 11, no. 2, pp. 136–149, 2025, doi: 10.21108/ijoict.v11i2.9513.

M. Tonneau, D. Liu, S. Fraiberger, R. Schroeder, S. Hale, and P. Röttger, “From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets,” in Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024), Stroudsburg, PA, USA: Association for Computational Linguistics, 2024, pp. 283–311. doi: 10.18653/v1/2024.woah-1.23.

A. T. Ni’mah and R. Yunitarini, “Relevance of the Retrieval of Hadith Information (RoHI) using Bidirectional Encoder Representations from Transformers (BERT) in religious education media,” in BIO Web of Conferences, EDP Sciences, Nov. 2024. doi: 10.1051/bioconf/202414601041.

A. Ahmad Aliero, B. Sulaimon Adebayo, H. Olanrewaju Aliyu, A. Gogo Tafida, B. Umar Kangiwa, and N. Muhammad Dankolo, “Systematic Review on Text Normalization Techniques and its Approach to Non-Standard Words,” Int. J. Comput. Appl., vol. 185, no. 33, pp. 975–8887, 2023.