COVID-19 Misinformation Detection in Indonesian Tweets using BERT

Fahmi Adi Nugraha; Dhomas Hatta Fudholi

doi:10.30865/mib.v7i2.5668

Authors

Fahmi Adi Nugraha Universitas Islam Indonesia, Yogyakarta
Dhomas Hatta Fudholi Universitas Islam Indonesia, Yogyakarta

DOI:

https://doi.org/10.30865/mib.v7i2.5668

Keywords:

COVID-19, Misinformation, Classification, BERT

Abstract

The COVID-19 pandemic has seen a marked increase in the spread of misinformation throughout various media channels, most notably social media. This is particularly true of Indonesia where a combination of middling digital literacy and the slow speed of fact-checking contributes to the continued spread of misinformation. Many of the solutions proposed by other researchers to address this problem do not use transformers despite the existence of Indonesian language BERT models. Thus, in order to both provide a potential solution to the problem of misinformation as well as a baseline for future research we propose an IndoBERT-based model for detecting misinformation in Indonesian language Tweets. For model training, we use the "small" version of the MuMiN dataset which is a comprehensive multi-lingual dataset containing fact checked Tweets. The authors of MuMiN provide a baseline LaBSE model which achieves a macro average F1-score of 54.5% when trained on the MuMiN "small" dataset. We train and evaluate our proposed model on this dataset in order to compare it to the LaBSE model. We also train and evaluate our model on a subset of the dataset containing only Tweets related to COVID-19 that we first translate into Indonesian. Our model achieves a best macro average F1-score of 59.5% on the MuMiN dataset and 79.04% on the subset.

References

â€œInfodemic.â€ https://www.who.int/health-topics/infodemic (accessed Apr. 11, 2022).

S. Aminah, G. Gunawan, J. Josep, and R. Rosidah, â€œGovernment Response to Handling Covid-19 Study Comparation: Lessons Learned from Taiwan and Indonesia,â€ presented at the 2nd Annual Conference on Education and Social Science (ACCESS 2020), May 2021, pp. 128â€“134. doi: 10.2991/assehr.k.210525.060.

S. Aminah et al., â€œThe Barriers of Policy Implementation of Handling Covid-19 Pandemic in Indonesia,â€ Clinical Medicine, vol. 08, no. 01, p. 20, 2021.

â€œDigital 2022: Indonesia,â€ DataReportal â€“ Global Digital Insights. https://datareportal.com/reports/digital-2022-indonesia (accessed Mar. 29, 2022).

A. Pramiyanti, I. D. Mayangsari, R. Nuraeni, and Y. D. Firdaus, â€œPublic Perception on Transparency and Trust in Government Information Released During the COVID-19 Pandemic,â€ AJPOR, vol. 8, no. 3, pp. 351â€“376, Aug. 2020, doi: 10.15206/ajpor.2020.8.3.351.

M. Tejamaya et al., â€œRisk Perception of COVID-19 in Indonesia During the First Stage of the Pandemic,â€ Frontiers in Public Health, vol. 9, 2021, Accessed: Apr. 10, 2022. [Online]. Available: https://www.frontiersin.org/article/10.3389/fpubh.2021.731459

â€œIndeks Literasi Digital Masyarakat Semakin Baik - Teknologi Katadata.co.id,â€ Feb. 07, 2022. https://katadata.co.id/dinihariyanti/digital/61fcc5caead2d/indeks-literasi-digital-masyarakat-semakin-baik (accessed May 19, 2022).

H. L. Dang, â€œSocial Media, Fake News, and the COVID-19 Pandemic: Sketching the Case of Southeast Asia,â€ Austrian Journal of South-East Asian Studies, vol. 14, no. 1, Art. no. 1, Jun. 2021, doi: 10.14764/10.ASEAS-0054.

J. Y. Khan, Md. T. I. Khondaker, S. Afroz, G. Uddin, and A. Iqbal, â€œA benchmark study of machine learning models for online fake news detection,â€ Machine Learning with Applications, vol. 4, p. 100032, Jun. 2021, doi: 10.1016/j.mlwa.2021.100032.

S. M. S.-U.-R. Shifath, M. F. Khan, and M. S. Islam, â€œA transformer based approach for fighting COVID-19 fake news,â€ arXiv:2101.12027 [cs], Jan. 2021, Accessed: Jan. 21, 2022. [Online]. Available: http://arxiv.org/abs/2101.12027

B. Palani, S. Elango, and V. Viswanathan K, â€œCB-Fake: A multimodal deep learning framework for automatic fake news detection using capsule neural network and BERT,â€ Multimed Tools Appl, Dec. 2021, doi: 10.1007/s11042-021-11782-3.

S. Malla and P. J. A. Alphonse, â€œFake or real news about COVID-19? Pretrained transformer model to detect potential misleading news,â€ Eur. Phys. J. Spec. Top., Jan. 2022, doi: 10.1140/epjs/s11734-022-00436-6.

R. K. Kaliyar, A. Goswami, and P. Narang, â€œFakeBERT: Fake news detection in social media with a BERT-based deep learning approach,â€ Multimed Tools Appl, vol. 80, no. 8, pp. 11765â€“11788, Mar. 2021, doi: 10.1007/s11042-020-10183-2.

J. Ayoub, X. J. Yang, and F. Zhou, â€œCombat COVID-19 infodemic using explainable natural language processing models,â€ Information Processing & Management, vol. 58, no. 4, p. 102569, Jul. 2021, doi: 10.1016/j.ipm.2021.102569.

N. L. Kolluri and D. Murthy, â€œCoVerifi: A COVID-19 news verification system,â€ Online Social Networks and Media, vol. 22, p. 100123, Mar. 2021, doi: 10.1016/j.osnem.2021.100123.

M. SzczepaÅ„ski, M. Pawlicki, R. Kozik, and M. ChoraÅ›, â€œNew explainability method for BERT-based model in fake news detection,â€ Sci Rep, vol. 11, no. 1, p. 23705, Dec. 2021, doi: 10.1038/s41598-021-03100-6.

I. A. Ropikoh, R. Abdulhakim, U. Enri, and N. Sulistiyowati, â€œPenerapan Algoritma Support Vector Machine (SVM) untuk Klasifikasi Berita Hoax Covid-19,â€ Journal of Applied Informatics and Computing (JAIC), vol. 5, no. 1, Art. no. 1, Jul. 2021, doi: 10.30871/jaic.v5i1.3167.

A. Awalina, J. Fawaid, R. Y. Krisnabayu, and N. Yudistira, â€œIndonesiaâ€™s Fake News Detection using Transformer Network,â€ 6th International Conference on Sustainable Information Engineering and Technology 2021, pp. 247â€“251, Sep. 2021, doi: 10.1145/3479645.3479666.

D. R. Faisal and R. Mahendra, â€œTwo-Stage Classifier for COVID-19 Misinformation Detection Using BERT: a Study on Indonesian Tweets.â€ arXiv, Jun. 30, 2022. Accessed: Jul. 18, 2022. [Online]. Available: http://arxiv.org/abs/2206.15359

F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, â€œIndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,â€ arXiv:2011.00677 [cs], Nov. 2020, Accessed: Feb. 05, 2022. [Online]. Available: http://arxiv.org/abs/2011.00677

S. Lintang and S. Lintang, â€œIndoBERT,â€ Universitas Gadjah Mada, 2020. Accessed: Feb. 04, 2022. [Online]. Available: http://etd.repository.ugm.ac.id/penelitian/detail/190630

B. Wilie et al., â€œIndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,â€ Sep. 2020, Accessed: Feb. 04, 2022. [Online]. Available: https://arxiv.org/abs/2009.05387v3

D. S. Nielsen and R. McConville, â€œMuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset.â€ arXiv, Mar. 08, 2022. doi: 10.48550/arXiv.2202.11684.

Y. Wang, Y. Zhang, X. Li, and X. Yu, â€œCOVID-19 Fake News Detection Using Bidirectional Encoder Representations from Transformers Based Models,â€ Sep. 2021, Accessed: Jan. 21, 2022. [Online]. Available: https://arxiv.org/abs/2109.14816v2

J. Opitz and S. Burst, â€œMacro F1 and Macro F1.â€ arXiv, Feb. 08, 2021. doi: 10.48550/arXiv.1911.03347.

COVID-19 Misinformation Detection in Indonesian Tweets using BERT

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License