Hoax Detection of Indonesian News Media on Twitter Using IndoBERT with Word Embedding Word2Vec

Pernanda Arya Bhagaskara S M; Sri Suryani Prasetiyowati; Yuliant Sibaroni

doi:10.30865/mib.v7i3.6367

Authors

Pernanda Arya Bhagaskara S M Telkom University, Bandung
Sri Suryani Prasetiyowati Telkom University, Bandung
Yuliant Sibaroni Telkom University, Bandung

DOI:

https://doi.org/10.30865/mib.v7i3.6367

Keywords:

Indonesian News Media, Hoax, IndoBERT, Word2Vec, Social Media

Abstract

Hoax is data that is added or deducted from the news that occurred. In the digital age, hoaxes are increasingly being spread, and people are very quickly affected by their spread, especially hoaxes circulating in Indonesian news media on social media. Disseminating information that has not been confirmed as accurate can cause public concern and anxiety. Virtual diversion has transformed into a correspondence key to begin thinking, talking, and moving around cordial issues. In this manner, exploration will be led by consolidating the IndoBERT model with the Word2Vec development highlight in arranging deception news in Indonesian news media. This model was constructed using K-Fold cross-validation to enhance model performance across extensive data sets. The information utilized comes from tweets shared on Twitter by the Indonesian public. The trials that have been carried out demonstrate that combining Word2Vec with IndoBERT is effective at detecting hoaxes, with an overall accuracy score of 88% for the entire dataset. This conclusion can be drawn from the classification results of Word2Vec with IndoBERT. Also, the best precision and incentive for every cycle is almost 99%. In addition, the study's objective is to identify hoax news in Indonesian news media disseminated via social media. This will encourage individuals to be more cautious when reading and disseminating news, as untrue information will significantly impact certain individuals.

References

C. Juditha, â€œHoax Communication Interactivity in Social Media and Anticipation (Interaksi Komunikasi Hoax di Media Sosial serta Antisipasinya),â€ Journal Pekommas, vol. 3, no. 1, p. 31, 2018, doi: 10.30818/jpkm.2018.2030104.

E. Utami, A. F. Iskandar, W. Hidayat, A. B. Prasetyo, and A. D. Hartanto, â€œCovid-19 Hoax Detection Using KNN in Jaccard Space,â€ IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 15, no. 3, p. 255, 2021, doi: 10.22146/ijccs.67392.

A. Zubiaga and A. Jiang, â€œEarly Detection of Social Media Hoaxes at Scale,â€ ACM Transactions on the Web, vol. 14, no. 4, 2020, doi: 10.1145/3407194.

P. N. Anggreyani and W. Maharani, â€œHoax Detection Tweets of the COVID-19 on Twitter Using LSTM-CNN with Word2Vec,â€ Jurnal Media Informatika Budidarma, vol. 6, no. 4, p. 2432, 2022, doi: 10.30865/mib.v6i4.4564.

M. A. Rahmat, Indrabayu, and I. S. Areni, â€œHoax web detection for news in bahasa using support vector machine,â€ 2019 International Conference on Information and Communications Technology, ICOIACT 2019, pp. 332â€“336, 2019, doi: 10.1109/ICOIACT46704.2019.8938425.

A. Fauzi, E. B. Setiawan, and Z. K. A. Baizal, â€œHoax News Detection on Twitter using Term Frequency Inverse Document Frequency and Support Vector Machine Method,â€ J Phys Conf Ser, vol. 1192, no. 1, 2019, doi: 10.1088/1742-6596/1192/1/012025.

M. Ikram, K. Sinapoy, Y. Sibaroni, and S. S. Prasetyowati, â€œJURNAL RESTI Comparison of LSTM and IndoBERT Method,â€ vol. 5, no. 158, pp. 2â€“6, 2023.

B. P. Nayoga, R. Adipradana, R. Suryadi, and D. Suhartono, â€œHoax Analyzer for Indonesian News Using Deep Learning Models,â€ Procedia Comput Sci, vol. 179, no. 2020, pp. 704â€“712, 2021, doi: 10.1016/j.procs.2021.01.059.

P. K. Pravin, â€œAutomatic Hoax Detection on Social Media Using Deep Learning,â€ no. January, pp. 1â€“57, 2021, [Online]. Available: www.bth.se

F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, â€œIndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,â€ COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference, pp. 757â€“770, 2020, doi: 10.18653/v1/2020.coling-main.66.

Rahmawati, Arnetta, Andry Alamsyah, and Ade Romadhony. "Hoax News Detection Analysis using IndoBERT Deep Learning Methodology." 2022 10th International Conference on Information and Communication Technology (ICoICT). IEEE, 2022.

L. H. Suadaa, I. Santoso, and A. T. B. Panjaitan, â€œTransfer Learning of Pre-trained Transformers for Covid-19 Hoax Detection in Indonesian Language,â€ IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 15, no. 3, p. 317, 2021, doi: 10.22146/ijccs.66205.

K. W. Church, â€œEmerging Trends: Word2Vec,â€ Nat Lang Eng, vol. 23, no. 1, pp. 155â€“162, 2017, doi: 10.1017/S1351324916000334.

T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin, â€œAdvances in pre-training distributed word representations,â€ LREC 2018 - 11th International Conference on Language Resources and Evaluation, no. 1, pp. 52â€“55, 2019.

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, â€œEnriching Word Vectors with Subword Information,â€ Trans Assoc Comput Linguist, vol. 5, pp. 135â€“146, 2017, doi: 10.1162/tacl_a_00051.

M. Baroni and A. Lenci, â€œDistributional memory: A general framework for corpus-based semantics,â€ Computational Linguistics, vol. 36, no. 4, pp. 675â€“721, 2010, doi: 10.1162/coli_a_00016.

F. Ismayanti and E. B. Setiawan, â€œDeteksi Konten Hoax Berbahasa Indonesia Di Twitter Menggunakan Fitur Ekspansi Dengan Word2vec,â€ eProceedings â€¦, vol. 8, no. 5, pp. 10288â€“10300, 2021, [Online]. Available: https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/15697%0Ahttps://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/15697/15410

H. M. Lee and Y. Sibaroni, â€œComparison of IndoBERTweet and Support Vector Machine on Sentiment Analysis of Racing Circuit Construction in Indonesia,â€ Jurnal Media Informatika Budidarma, vol. 7, no. 1, pp. 99â€“106, 2023, doi: 10.30865/mib.v7i1.5380.

F. Koto, J. H. Lau, and T. Baldwin, â€œLiputan6: A Large-scale Indonesian Dataset for Text Summarization,â€ no. 1, 2020, [Online]. Available: http://arxiv.org/abs/2011.00679

S. M. Isa, G. Nico, and M. Permana, â€œIndobert for Indonesian Fake News Detection,â€ ICIC Express Letters, vol. 16, no. 3, pp. 289â€“297, 2022, doi: 10.24507/icicel.16.03.289.

S. Sivakumar, L. S. Videla, T. Rajesh Kumar, J. Nagaraj, S. Itnal, and D. Haritha, â€œReview on Word2Vec Word Embedding Neural Net,â€ Proceedings - International Conference on Smart Electronics and Communication, ICOSEC 2020, no. Icosec, pp. 282â€“290, 2020, doi: 10.1109/ICOSEC49089.2020.9215319.

Hoax Detection of Indonesian News Media on Twitter Using IndoBERT with Word Embedding Word2Vec

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Menu Utama

flagcounter

template

statcounter

rji

terindex