COVID-19 Misinformation Detection in Indonesian Tweets using BERT
DOI:
https://doi.org/10.30865/mib.v7i2.5668Keywords:
COVID-19, Misinformation, Classification, BERTAbstract
The COVID-19 pandemic has seen a marked increase in the spread of misinformation throughout various media channels, most notably social media. This is particularly true of Indonesia where a combination of middling digital literacy and the slow speed of fact-checking contributes to the continued spread of misinformation. Many of the solutions proposed by other researchers to address this problem do not use transformers despite the existence of Indonesian language BERT models. Thus, in order to both provide a potential solution to the problem of misinformation as well as a baseline for future research we propose an IndoBERT-based model for detecting misinformation in Indonesian language Tweets. For model training, we use the "small" version of the MuMiN dataset which is a comprehensive multi-lingual dataset containing fact checked Tweets. The authors of MuMiN provide a baseline LaBSE model which achieves a macro average F1-score of 54.5% when trained on the MuMiN "small" dataset. We train and evaluate our proposed model on this dataset in order to compare it to the LaBSE model. We also train and evaluate our model on a subset of the dataset containing only Tweets related to COVID-19 that we first translate into Indonesian. Our model achieves a best macro average F1-score of 59.5% on the MuMiN dataset and 79.04% on the subset.References
“Infodemic.†https://www.who.int/health-topics/infodemic (accessed Apr. 11, 2022).
S. Aminah, G. Gunawan, J. Josep, and R. Rosidah, “Government Response to Handling Covid-19 Study Comparation: Lessons Learned from Taiwan and Indonesia,†presented at the 2nd Annual Conference on Education and Social Science (ACCESS 2020), May 2021, pp. 128–134. doi: 10.2991/assehr.k.210525.060.
S. Aminah et al., “The Barriers of Policy Implementation of Handling Covid-19 Pandemic in Indonesia,†Clinical Medicine, vol. 08, no. 01, p. 20, 2021.
“Digital 2022: Indonesia,†DataReportal – Global Digital Insights. https://datareportal.com/reports/digital-2022-indonesia (accessed Mar. 29, 2022).
A. Pramiyanti, I. D. Mayangsari, R. Nuraeni, and Y. D. Firdaus, “Public Perception on Transparency and Trust in Government Information Released During the COVID-19 Pandemic,†AJPOR, vol. 8, no. 3, pp. 351–376, Aug. 2020, doi: 10.15206/ajpor.2020.8.3.351.
M. Tejamaya et al., “Risk Perception of COVID-19 in Indonesia During the First Stage of the Pandemic,†Frontiers in Public Health, vol. 9, 2021, Accessed: Apr. 10, 2022. [Online]. Available: https://www.frontiersin.org/article/10.3389/fpubh.2021.731459
“Indeks Literasi Digital Masyarakat Semakin Baik - Teknologi Katadata.co.id,†Feb. 07, 2022. https://katadata.co.id/dinihariyanti/digital/61fcc5caead2d/indeks-literasi-digital-masyarakat-semakin-baik (accessed May 19, 2022).
H. L. Dang, “Social Media, Fake News, and the COVID-19 Pandemic: Sketching the Case of Southeast Asia,†Austrian Journal of South-East Asian Studies, vol. 14, no. 1, Art. no. 1, Jun. 2021, doi: 10.14764/10.ASEAS-0054.
J. Y. Khan, Md. T. I. Khondaker, S. Afroz, G. Uddin, and A. Iqbal, “A benchmark study of machine learning models for online fake news detection,†Machine Learning with Applications, vol. 4, p. 100032, Jun. 2021, doi: 10.1016/j.mlwa.2021.100032.
S. M. S.-U.-R. Shifath, M. F. Khan, and M. S. Islam, “A transformer based approach for fighting COVID-19 fake news,†arXiv:2101.12027 [cs], Jan. 2021, Accessed: Jan. 21, 2022. [Online]. Available: http://arxiv.org/abs/2101.12027
B. Palani, S. Elango, and V. Viswanathan K, “CB-Fake: A multimodal deep learning framework for automatic fake news detection using capsule neural network and BERT,†Multimed Tools Appl, Dec. 2021, doi: 10.1007/s11042-021-11782-3.
S. Malla and P. J. A. Alphonse, “Fake or real news about COVID-19? Pretrained transformer model to detect potential misleading news,†Eur. Phys. J. Spec. Top., Jan. 2022, doi: 10.1140/epjs/s11734-022-00436-6.
R. K. Kaliyar, A. Goswami, and P. Narang, “FakeBERT: Fake news detection in social media with a BERT-based deep learning approach,†Multimed Tools Appl, vol. 80, no. 8, pp. 11765–11788, Mar. 2021, doi: 10.1007/s11042-020-10183-2.
J. Ayoub, X. J. Yang, and F. Zhou, “Combat COVID-19 infodemic using explainable natural language processing models,†Information Processing & Management, vol. 58, no. 4, p. 102569, Jul. 2021, doi: 10.1016/j.ipm.2021.102569.
N. L. Kolluri and D. Murthy, “CoVerifi: A COVID-19 news verification system,†Online Social Networks and Media, vol. 22, p. 100123, Mar. 2021, doi: 10.1016/j.osnem.2021.100123.
M. Szczepański, M. Pawlicki, R. Kozik, and M. Choraś, “New explainability method for BERT-based model in fake news detection,†Sci Rep, vol. 11, no. 1, p. 23705, Dec. 2021, doi: 10.1038/s41598-021-03100-6.
I. A. Ropikoh, R. Abdulhakim, U. Enri, and N. Sulistiyowati, “Penerapan Algoritma Support Vector Machine (SVM) untuk Klasifikasi Berita Hoax Covid-19,†Journal of Applied Informatics and Computing (JAIC), vol. 5, no. 1, Art. no. 1, Jul. 2021, doi: 10.30871/jaic.v5i1.3167.
A. Awalina, J. Fawaid, R. Y. Krisnabayu, and N. Yudistira, “Indonesia’s Fake News Detection using Transformer Network,†6th International Conference on Sustainable Information Engineering and Technology 2021, pp. 247–251, Sep. 2021, doi: 10.1145/3479645.3479666.
D. R. Faisal and R. Mahendra, “Two-Stage Classifier for COVID-19 Misinformation Detection Using BERT: a Study on Indonesian Tweets.†arXiv, Jun. 30, 2022. Accessed: Jul. 18, 2022. [Online]. Available: http://arxiv.org/abs/2206.15359
F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,†arXiv:2011.00677 [cs], Nov. 2020, Accessed: Feb. 05, 2022. [Online]. Available: http://arxiv.org/abs/2011.00677
S. Lintang and S. Lintang, “IndoBERT,†Universitas Gadjah Mada, 2020. Accessed: Feb. 04, 2022. [Online]. Available: http://etd.repository.ugm.ac.id/penelitian/detail/190630
B. Wilie et al., “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,†Sep. 2020, Accessed: Feb. 04, 2022. [Online]. Available: https://arxiv.org/abs/2009.05387v3
D. S. Nielsen and R. McConville, “MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset.†arXiv, Mar. 08, 2022. doi: 10.48550/arXiv.2202.11684.
Y. Wang, Y. Zhang, X. Li, and X. Yu, “COVID-19 Fake News Detection Using Bidirectional Encoder Representations from Transformers Based Models,†Sep. 2021, Accessed: Jan. 21, 2022. [Online]. Available: https://arxiv.org/abs/2109.14816v2
J. Opitz and S. Burst, “Macro F1 and Macro F1.†arXiv, Feb. 08, 2021. doi: 10.48550/arXiv.1911.03347.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).