Effectiveness of Word Embedding GloVe and Word2Vec within News Detection of Indonesian uUsing LSTM

 Muhammad Ghifari Adrian (Telkom University, Bandung, Indonesia)
 Sri Suryani Prasetyowati (Telkom University, Bandung, Indonesia)
 (*)Yuliant Sibaroni Mail (Telkom University, Bandung, Indonesia)

(*) Corresponding Author

Submitted: June 20, 2023; Published: July 23, 2023

Abstract

In recent years the use of social media platforms in Indonesia has continued to increase. The increasing use of social media has several advantages and disadvantages. The advantage is that the news is easily accessible by anyone, while the disadvantage is that much information that is spread is hoax news. Hoax news must be detected because hoax news spreads false and misleading information. This undermines the integrity of the information and needs to be clarified for the public. By detecting hoax news, we can ensure the information being disseminated is accurate and trustworthy. In this study, the author will detect hoax news on Indonesian news media on Twitter using LSTM with word embedding GloVe and Word2Vec and compare the two-word embeddings to find the best performance in the LSTM model. The reason for choosing the GloVe and Word2Vec extraction features to be compared is that both are useful for representing vectors of words. Their performance may vary. Word2Vec might better capture semantic relationships between words, whereas GloVe might better capture distributional relationships and word co-occurrence. This study shows that LSTM with Word2Vec performs better than LSTM and GloVe in detecting Indonesian language news. LSTM and Word2Vec produced an average accuracy value of 95%, while LSTM with GloVe produced an average accuracy value of 90%.

Keywords


Hoax; Classification; GloVe; Word2Vec; LSTM

Full Text:

PDF


Article Metrics

Abstract view : 360 times
PDF - 151 times

References

“Digital 2022: Indonesia — DataReportal – Global Digital Insights.” https://datareportal.com/reports/digital-2022-indonesia (accessed Nov. 29, 2022).

A. Zubiaga and A. Jiang, “Early Detection of Social Media Hoaxes at Scale,” ACM Transactions on the Web (TWEB), vol. 14, no. 4, Aug. 2020, doi: 10.1145/3407194.

I. Setyawan, “Factors Causing the Spread of Hoax News Via Social Media in Village Communities,” SSRN Electronic Journal, Apr. 2020, doi: 10.2139/SSRN.3587522.

X. Zhang and A. A. Ghorbani, “An overview of online fake news: Characterization, detection, and discussion,” Inf Process Manag, vol. 57, no. 2, p. 102025, Mar. 2020, doi: 10.1016/J.IPM.2019.03.004.

A. Apriliyanto and R. Kusumaningrum, “HOAX DETECTION IN INDONESIA LANGUAGE USING LONG SHORT-TERM MEMORY MODEL,” SINERGI, vol. 24, no. 3, pp. 189–196, Jul. 2020, doi: 10.22441/SINERGI.

P. F. Muhammad, R. Kusumaningrum, and A. Wibowo, “Sentiment Analysis Using Word2vec And Long Short-Term Memory (LSTM) For Indonesian Hotel Reviews,” Procedia Comput Sci, vol. 179, pp. 728–735, Jan. 2021, doi: 10.1016/J.PROCS.2021.01.061.

A. Nurdin, B. Anggo, S. Aji, A. Bustamin, and Z. Abidin, “PERBANDINGAN KINERJA WORD EMBEDDING WORD2VEC, GLOVE, DAN FASTTEXT PADA KLASIFIKASI TEKS,” Jurnal Tekno Kompak, vol. 14, no. 2, pp. 74–79, Aug. 2020, doi: 10.33365/JTK.V14I2.732.

“GloVe: Global Vectors for Word Representation.” https://nlp.stanford.edu/projects/glove/ (accessed Jul. 04, 2023).

“Datasets Documentation | Kaggle.” https://www.kaggle.com/docs/datasets (accessed Jun. 01, 2023).

“Indonesian Fact and Hoax Political News | Kaggle.” https://www.kaggle.com/datasets/linkgish/indonesian-fact-and-hoax-political-news (accessed Jun. 17, 2023).

A. Nurdin, B. Anggo, S. Aji, A. Bustamin, and Z. Abidin, “PERBANDINGAN KINERJA WORD EMBEDDING WORD2VEC, GLOVE, DAN FASTTEXT PADA KLASIFIKASI TEKS,” Jurnal Tekno Kompak, vol. 14, no. 2, pp. 74–79, Aug. 2020, doi: 10.33365/JTK.V14I2.732.

J. Patihullah and E. Winarko, “Hate Speech Detection for Indonesia Tweets Using Word Embedding And Gated Recurrent Unit,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 13, no. 1, p. 43, Jan. 2019, doi: 10.22146/ijccs.40125.

D. Qiu, H. Jiang, and S. Chen, “Fuzzy Information Retrieval Based on Continuous Bag-of-Words Model,” Symmetry 2020, Vol. 12, Page 225, vol. 12, no. 2, p. 225, Feb. 2020, doi: 10.3390/SYM12020225.

Y. Song, S. Shi, J. Li, and H. Zhang, “Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings,” NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, vol. 2, pp. 175–180, 2018, doi: 10.18653/V1/N18-2028.

E. Suryati, S. Styawati, and A. A. Aldino, “Analisis Sentimen Transportasi Online Menggunakan Ekstraksi Fitur Model Word2vec Text Embedding Dan Algoritma Support Vector Machine (SVM),” Jurnal Teknologi dan Sistem Informasi, vol. 4, no. 1, pp. 96–106, Mar. 2023, Accessed: Jun. 09, 2023. [Online]. Available: http://jim.teknokrat.ac.id/index.php/sisteminformasi/article/view/2445

M. F. A. Putro and E. B. Setiawan, “Analisis Sentimen Terhadap Kebijakan Pemerintah Indonesia Dengan Feature Expansion Metode Glove Pada Media Sosial Twitter,” eProceedings of Engineering, vol. 9, no. 1, Feb. 2022, Accessed: Nov. 29, 2022. [Online]. Available: https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/17389/17099

R. Ni and H. Cao, “Sentiment Analysis based on GloVe and LSTM-GRU,” Chinese Control Conference, CCC, vol. 2020-July, pp. 7492–7497, Jul. 2020, doi: 10.23919/CCC50068.2020.9188578.

B. P. Nayoga, R. Adipradana, R. Suryadi, and D. Suhartono, “Hoax Analyzer for Indonesian News Using Deep Learning Models,” Procedia Comput Sci, vol. 179, pp. 704–712, Jan. 2021, doi: 10.1016/J.PROCS.2021.01.059.

F. Tempola, M. Muhammad, and A. Khairan, “Perbandingan Klasifikasi Antara KNN dan Naive Bayes pada Penentuan Status Gunung Berapi dengan K-Fold Cross Validation,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 5, no. 5, pp. 577–584, Oct. 2018, doi: 10.25126/JTIIK.201855983.

A. Kulkarni, D. Chong, and F. A. Batarseh, “Foundations of data imbalance and solutions for a data democracy,” Data Democracy: At the Nexus of Artificial Intelligence, Software Development, and Knowledge Engineering, pp. 83–106, Jan. 2020, doi: 10.1016/B978-0-12-818366-3.00005-8.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Effectiveness of Word Embedding GloVe and Word2Vec within News Detection of Indonesian uUsing LSTM

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 JURNAL MEDIA INFORMATIKA BUDIDARMA

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.



JURNAL MEDIA INFORMATIKA BUDIDARMA
STMIK Budi Darma
Secretariat: Sisingamangaraja No. 338 Telp 061-7875998
Email: mib.stmikbd@gmail.com

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.