Misogyny Text Detection on Tiktok Social Media in Indonesian Using the Pre-trained Language Model IndoBERTweet

Perwira Hanif Zakaria, Dade Nurjannah, Hani Nurrahmi

Abstract


Social media is a popular communication and information platform due to its ease and speed of access. By using social media, one can express himself freely. This triggers irresponsible individuals to utter hate speech with the aim of bringing down a person or group of people. Misogyny is a form of hate speech directed at women. The problem of misogyny should not be underestimated because misogyny can be one of the main reasons women feel miserable. In this study, a model will be built to detect misogyny text on the Indonesian language TikTok social media using the IndoBERTweet pre-trained model. IndoBERTweet is a pre-trained model based on the BERT model, which has been trained using Indonesian language datasets taken from the previous Twitter social media, resulting in a good performance for detecting misogynous texts on social media by classifying them. The dataset used is in the form of text data taken from misogyny comments by focusing on forms of misogyny in the form of stereotypes, dominance, sexual harassment, and discredit in short video content on women's TikTok social media accounts. The performance of built model performs hyperparameter settings which include batch size 16, epochs 10, and learning rate 7e-5 and is evaluated using a confusion matrix with the best accuracy results of 76.89%.

Keywords


Misogyny; BERT; Pre-trained Model; IndoBERTweet

Full Text:

PDF

References


A. El Mahdaouy, A. El Mekki, A. Oumar, H. Mousannif, and I. Berrada, “Deep Multi-Task Models for Misogyny Identification and Categorization on Arabic Social Media,†Jun. 2022, [Online]. Available: http://arxiv.org/abs/2206.08407

M. E. Moloney and T. P. Love, “Assessing online misogyny: Perspectives from sociology and feminist media studies,†Sociol Compass, vol. 12, no. 5, May 2018, doi: 10.1111/soc4.12577.

J. S. Canós, “Misogyny Identification Through SVM at IberEval 2018,†In IberEval@SEPLN, Sep. 2018, pp. 229-233.

N. Dehingia, R. Lundgren, A. K. Dey, and A. Raj, “BIG DATA AND GENDER IN THE AGE OF COVID-19: A BRIEF SERIES FROM UC SAN DIEGO,†2020. [Online]. Available: https://www.aljazeera.com/news/2020/5/5/bois-locker-room-indian-teen-probed-over-instagram-rape-chats

G. Sharma, G. S. Gitte, S. Goyal, and R. Sharma, “IITR CodeBusters at SemEval-2022 Task 5: Misogyny Identification using Transformers,†In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), 2022, pp. 728-732, doi: 10.18653/v1/2022.semeval-1.100

E. Fersini, P. Rosso, and M. Anzovino, “Overview of the Task on Automatic Misogyny Identification at IberEval 2018,†In Proceedings of the third workshop on evaluation of human language technologies for Iberian Languages (IberEval 2018), 2018, pp. 1-15. CEUR.org

R. S. Angeline, D. Nurjanah, and H. Nurrahmi, “Misogyny Speech Detection Using Long Short-Term Memory and BERT Embeddings,†in 2022 5th International Conference on Information and Communications Technology (ICOIACT), IEEE, Aug. 2022, pp. 155–159, doi: 10.1109/ICOIACT55506.2022.9972171.

C. Graney-Ward, B. Issac, L. Ketsbaia, and S. M. Jacob, “Detection of Cyberbullying Through BERT and Weighted Ensemble of Classifiers,†2021, doi: 10.36227/techrxiv.17705009.v1

Y. Guo, X. Dong, M. A. Al-Garadi, A. Sarker, C. Paris, and D. Mollá-Aliod, “Benchmarking of Transformer-Based Pre-Trained Models on Social Media Text Classification Datasets,†2020. [Online]. Available: https://nlp.stanford.edu/projects/glove/

F. Koto, J. H. Lau, and T. Baldwin, “IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization,†Sep. 2021, [Online]. Available: http://arxiv.org/abs/2109.04607

S. Frenda, B. Ghanem, M. Montes-Y-Gómez, and P. Rosso, “Online hate speech against women: Automatic identification of misogyny and sexism on twitter,†Journal of Intelligent and Fuzzy Systems, vol. 36, no. 5, pp. 4743–4752, 2019, doi: 10.3233/JIFS-179023.

Biere, Shanita, S. Bhulai, and M. B. Analytics. "Hate speech detection using natural language processing techniques." In Master Business Analytics Department of Mathematics Faculty of Science, Aug. 2018

H. Kumar Sharma, K. Kshitiz, and Shailendra, “NLP and Machine Learning Techniques for Detecting Insulting Comments on Social Networking Platforms,†in 2018 International Conference on Advances in Computing and Communication Engineering (ICACCE), 2018, pp. 265–272, doi: 10.1109/ICACCE.2018.8441728.

C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge University Press, 2008, doi: 10.1017/CBO9780511809071.

U. Hasanah, T. Astuti, R. Wahyudi, Z. Rifai, and R. A. Pambudi, “An Experimental Study of Text Preprocessing Techniques for Automatic Short Answer Grading in Indonesian,†in 2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE), 2018, pp. 230–234, doi: 10.1109/ICITISEE.2018.8720957.

A. Mahmood and J. L. Wang, “Machine learning for high performance organic solar cells: Current scenario and future prospects,†Energy and Environmental Science, vol. 14, no. 1. Royal Society of Chemistry, pp. 90–105, Jan. 01, 2021, doi: 10.1039/d0ee02838j.

S. González-Carvajal and E. C. Garrido-Merchán, “Comparing BERT against traditional machine learning text classification,†May 2020, [Online]. Available: http://arxiv.org/abs/2005.13012

J. Devlin, M.-W. Chang, K. Lee, K. T. Google, and A. I. Language, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,†2019, doi: 10.48550/arXiv:1810.04805

A. Mohsin Abdulazeez, “Impact of Deep Learning on Transfer Learning : A Review,†2021, doi: 10.5281/zenodo.4559668.

S. Saadah et al., “Implementation of BERT, IndoBERT, and CNN-LSTM in Classifying Public Opinion about COVID-19 Vaccine in Indonesia,†Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 4, pp. 648–655, 2022, doi: 10.29207/resti.v6i4.4215

M. Zidni Subarkah, M. Hildha, N. Tri Amanda, and E. Zukhronah, “Analisis Sentimen Review Tempat Wisata Pada Data Analisis Sentimen Review Tempat Wisata Pada Data Online Travel Agency Di Yogyakarta Menggunakan Model Neural Network IndoBERTweet Fine Tuning (Analysis of Sentiment Reviews of Tourist Attractions on Online Travel Agency Data in Yogyakarta Using the IndoBERTweet Fine Tuning Neural Network Model).â€2022, doi: 10.34123/semnasoffstat.v2022i1.1246

A. Luque, A. Carrasco, A. Martín, and A. de las Heras, “The impact of class imbalance in classification performance metrics based on the binary confusion matrix,†Pattern Recognit, vol. 91, pp. 216–231, Jul. 2019, doi: 10.1016/j.patcog.2019.02.023.

C. Das, A. K. Sahoo, and C. Pradhan, “Chapter 12 - Multicriteria recommender system using different approaches,†in Cognitive Big Data Intelligence with a Metaheuristic Approach, S. Mishra, H. K. Tripathy, P. K. Mallick, A. K. Sangaiah, and G.-S. Chae, Eds., in Cognitive Data Science in Sustainable Computing. Academic Press, 2022, pp. 259–277, doi: https://doi.org/10.1016/B978-0-323-85117-6.00011-X.

H. Yun, “Prediction model of algal blooms using logistic regression and confusion matrix,†International Journal of Electrical and Computer Engineering, vol. 11, no. 3, pp. 2407–2413, Jun. 2021, doi: 10.11591/ijece.v11i3.pp2407-2413.




DOI: https://doi.org/10.30865/mib.v7i3.6438

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 JURNAL MEDIA INFORMATIKA BUDIDARMA

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.



JURNAL MEDIA INFORMATIKA BUDIDARMA
Universitas Budi Darma
Secretariat: Sisingamangaraja No. 338 Telp 061-7875998
Email: mib.stmikbd@gmail.com

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.