Hoax Detection of Indonesian News Media on Twitter Using IndoBERT with Word Embedding Word2Vec

Authors

  • Pernanda Arya Bhagaskara S M Telkom University, Bandung
  • Sri Suryani Prasetiyowati Telkom University, Bandung
  • Yuliant Sibaroni Telkom University, Bandung

DOI:

https://doi.org/10.30865/mib.v7i3.6367

Keywords:

Indonesian News Media, Hoax, IndoBERT, Word2Vec, Social Media

Abstract

Hoax is data that is added or deducted from the news that occurred. In the digital age, hoaxes are increasingly being spread, and people are very quickly affected by their spread, especially hoaxes circulating in Indonesian news media on social media. Disseminating information that has not been confirmed as accurate can cause public concern and anxiety. Virtual diversion has transformed into a correspondence key to begin thinking, talking, and moving around cordial issues. In this manner, exploration will be led by consolidating the IndoBERT model with the Word2Vec development highlight in arranging deception news in Indonesian news media. This model was constructed using K-Fold cross-validation to enhance model performance across extensive data sets. The information utilized comes from tweets shared on Twitter by the Indonesian public. The trials that have been carried out demonstrate that combining Word2Vec with IndoBERT is effective at detecting hoaxes, with an overall accuracy score of 88% for the entire dataset. This conclusion can be drawn from the classification results of Word2Vec with IndoBERT. Also, the best precision and incentive for every cycle is almost 99%. In addition, the study's objective is to identify hoax news in Indonesian news media disseminated via social media. This will encourage individuals to be more cautious when reading and disseminating news, as untrue information will significantly impact certain individuals.

References

C. Juditha, “Hoax Communication Interactivity in Social Media and Anticipation (Interaksi Komunikasi Hoax di Media Sosial serta Antisipasinya),†Journal Pekommas, vol. 3, no. 1, p. 31, 2018, doi: 10.30818/jpkm.2018.2030104.

E. Utami, A. F. Iskandar, W. Hidayat, A. B. Prasetyo, and A. D. Hartanto, “Covid-19 Hoax Detection Using KNN in Jaccard Space,†IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 15, no. 3, p. 255, 2021, doi: 10.22146/ijccs.67392.

A. Zubiaga and A. Jiang, “Early Detection of Social Media Hoaxes at Scale,†ACM Transactions on the Web, vol. 14, no. 4, 2020, doi: 10.1145/3407194.

P. N. Anggreyani and W. Maharani, “Hoax Detection Tweets of the COVID-19 on Twitter Using LSTM-CNN with Word2Vec,†Jurnal Media Informatika Budidarma, vol. 6, no. 4, p. 2432, 2022, doi: 10.30865/mib.v6i4.4564.

M. A. Rahmat, Indrabayu, and I. S. Areni, “Hoax web detection for news in bahasa using support vector machine,†2019 International Conference on Information and Communications Technology, ICOIACT 2019, pp. 332–336, 2019, doi: 10.1109/ICOIACT46704.2019.8938425.

A. Fauzi, E. B. Setiawan, and Z. K. A. Baizal, “Hoax News Detection on Twitter using Term Frequency Inverse Document Frequency and Support Vector Machine Method,†J Phys Conf Ser, vol. 1192, no. 1, 2019, doi: 10.1088/1742-6596/1192/1/012025.

M. Ikram, K. Sinapoy, Y. Sibaroni, and S. S. Prasetyowati, “JURNAL RESTI Comparison of LSTM and IndoBERT Method,†vol. 5, no. 158, pp. 2–6, 2023.

B. P. Nayoga, R. Adipradana, R. Suryadi, and D. Suhartono, “Hoax Analyzer for Indonesian News Using Deep Learning Models,†Procedia Comput Sci, vol. 179, no. 2020, pp. 704–712, 2021, doi: 10.1016/j.procs.2021.01.059.

P. K. Pravin, “Automatic Hoax Detection on Social Media Using Deep Learning,†no. January, pp. 1–57, 2021, [Online]. Available: www.bth.se

F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,†COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference, pp. 757–770, 2020, doi: 10.18653/v1/2020.coling-main.66.

Rahmawati, Arnetta, Andry Alamsyah, and Ade Romadhony. "Hoax News Detection Analysis using IndoBERT Deep Learning Methodology." 2022 10th International Conference on Information and Communication Technology (ICoICT). IEEE, 2022.

L. H. Suadaa, I. Santoso, and A. T. B. Panjaitan, “Transfer Learning of Pre-trained Transformers for Covid-19 Hoax Detection in Indonesian Language,†IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 15, no. 3, p. 317, 2021, doi: 10.22146/ijccs.66205.

K. W. Church, “Emerging Trends: Word2Vec,†Nat Lang Eng, vol. 23, no. 1, pp. 155–162, 2017, doi: 10.1017/S1351324916000334.

T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin, “Advances in pre-training distributed word representations,†LREC 2018 - 11th International Conference on Language Resources and Evaluation, no. 1, pp. 52–55, 2019.

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching Word Vectors with Subword Information,†Trans Assoc Comput Linguist, vol. 5, pp. 135–146, 2017, doi: 10.1162/tacl_a_00051.

M. Baroni and A. Lenci, “Distributional memory: A general framework for corpus-based semantics,†Computational Linguistics, vol. 36, no. 4, pp. 675–721, 2010, doi: 10.1162/coli_a_00016.

F. Ismayanti and E. B. Setiawan, “Deteksi Konten Hoax Berbahasa Indonesia Di Twitter Menggunakan Fitur Ekspansi Dengan Word2vec,†eProceedings …, vol. 8, no. 5, pp. 10288–10300, 2021, [Online]. Available: https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/15697%0Ahttps://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/15697/15410

H. M. Lee and Y. Sibaroni, “Comparison of IndoBERTweet and Support Vector Machine on Sentiment Analysis of Racing Circuit Construction in Indonesia,†Jurnal Media Informatika Budidarma, vol. 7, no. 1, pp. 99–106, 2023, doi: 10.30865/mib.v7i1.5380.

F. Koto, J. H. Lau, and T. Baldwin, “Liputan6: A Large-scale Indonesian Dataset for Text Summarization,†no. 1, 2020, [Online]. Available: http://arxiv.org/abs/2011.00679

S. M. Isa, G. Nico, and M. Permana, “Indobert for Indonesian Fake News Detection,†ICIC Express Letters, vol. 16, no. 3, pp. 289–297, 2022, doi: 10.24507/icicel.16.03.289.

S. Sivakumar, L. S. Videla, T. Rajesh Kumar, J. Nagaraj, S. Itnal, and D. Haritha, “Review on Word2Vec Word Embedding Neural Net,†Proceedings - International Conference on Smart Electronics and Communication, ICOSEC 2020, no. Icosec, pp. 282–290, 2020, doi: 10.1109/ICOSEC49089.2020.9215319.

Downloads

Published

2023-07-23

Issue

Section

Articles