Sentiment Analysis using Random Forest and Word2Vec for Indonesian Language Movie Reviews

 (*)Fahriza Ichsani Rafif Mail (Telkom University, Bandung, Indonesia)
 Mahendra Dwifebri Purbolaksono (Telkom University, Bandung, Indonesia)
 Widi Astuti (Telkom University, Bandung, Indonesia)

(*) Corresponding Author

Submitted: June 9, 2023; Published: July 23, 2023

Abstract

The film industry in recent years has become one of the industries that people are most interested in. The convenience of watching movies through streaming services is one of the reasons why watching movies is so popular. This ease of access resulted in a large selection of available movies and encouraged the public to look for movie reviews to find out whether the movies was good or bad. Freedom of expression on the internet has resulted in many movie reviews being spread. Therefore, sentiment analysis was conducted to see the positive or negative of these reviews. The method used in this research is Random Forest and Word2Vec skip-gram as feature extraction. The Random Forest classification was chosen because Randomforest is a highly flexible and highly accurate method, while Word2Vec Skip-Gram is used as a feature extraction because it is an efficient model that studies a large number of word vectors in an irregular text. The best model obtained from this experiment is a model built with stemming, Word2Vec with 300 dimensions, and a max_depth value of 23, achieving an f1-score of 83.59%.

Keywords


Sentiment Analysis; Random Forest; Word2Vec; Movie Review

Full Text:

PDF


Article Metrics

Abstract view : 359 times
PDF - 198 times

References

A. Andreyestha and A. Subekti, “Analisa Sentiment Pada Ulasan Film Dengan Optimasi Ensemble Learning,” J. Inform., vol. 7, no. 1, pp. 15–23, 2020, doi: 10.31311/ji.v7i1.6171.

S. Bhatia, M. Sharma, and K. K. Bhatia, “Sentiment Analysis and Mining of Opinions,” Stud. Big Data, vol. 30, no. May, pp. 503–523, 2018, doi: 10.1007/978-3-319-60435-0_20.

S. Mukherjee, “Sentiment Analysis,” ML.NET Reveal., pp. 113–127, 2021, doi: 10.1007/978-1-4842-6543-7_7.

J. Khan, A. Alam, and Y. Lee, “Intelligent Hybrid Feature Selection for Textual Sentiment Classification,” IEEE Access, vol. 9, pp. 140590–140608, 2021, doi: 10.1109/ACCESS.2021.3118982.

S. Ballı and O. Karasoy, “Development of content-based SMS classification application by using Word2Vec-based feature extraction,” IET Softw., vol. 13, no. 4, pp. 295–304, 2019, doi: 10.1049/iet-sen.2018.5046.

A. Ramadhan, B. Susetyo, and Indahwati, “Penerapan Metode Klasifikasi Random Forest Dalam Mengidentifikasi Faktor Penting Penilaian Mutu Pendidikan,” J. Pendidik. dan Kebud., vol. 4, no. 2, pp. 169–182, 2019, doi: 10.24832/jpnk.v4i2.1327.

I. Steinke, J. Wier, L. Simon, and R. Seetan, “Sentiment Analysis of Online Movie Reviews using Machine Learning,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 9, pp. 618–624, 2022, doi: 10.14569/IJACSA.2022.0130973.

S. M. Qaisar, “Sentiment Analysis of IMDb Movie Reviews Using Long Short-Term Memory,” 2020 2nd Int. Conf. Comput. Inf. Sci. ICCIS 2020, no. November 2020, 2020, doi: 10.1109/ICCIS49240.2020.9257657.

M. A. A. Jihad, Adiwijaya, and W. Astuti, “Analisis sentimen terhadap ulasan film menggunakan algoritma random forest,” e-Proceeding Eng., vol. 8, no. 5, pp. 10153–10165, 2021.

W. Widayat, “Analisis Sentimen Movie Review menggunakan Word2Vec dan metode LSTM Deep Learning,” J. Media Inform. Budidarma, vol. 5, no. 3, p. 1018, 2021, doi: 10.30865/mib.v5i3.3111.

F. W. KURNIAWAN, “Analisis Sentimen Twitter Bahasa Indonesia dengan Word2Vec,” Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 2, pp. 4704–4713, 2020, [Online]. Available: https://openlibrary.telkomuniversity.ac.id/home/catalog/id/159923/slug/analisis-sentimen-twitter-bahasa-indonesia-dengan-word2vec.html%0A/home/catalog/id/159923/slug/analisis-sentimen-twitter-bahasa-indonesia-dengan-word2vec.html

H. Juwintho, E. Setiawan, and J. Santoso, “Sentiment Analysis Twitter Bahasa Indonesia Berbasis WORD2VEC Menggunakan Deep Convolutional Neural Network,” J. Teknol. Inf. dan Ilmu Komput., vol. 7, no. 1, pp. 181–188, 2020, doi: 10.25126/jtiik.202071758.

E. A. Felix and S. P. Lee, “Systematic literature review of preprocessing techniques for imbalanced data,” IET Softw., vol. 13, no. 6, pp. 479–496, 2019, doi: 10.1049/iet-sen.2018.5193.

D. J. Putri and M. Dwifebri, “Text Classification of Indonesian Translated Hadith Using XGBoost Model and Chi-Square Feature Selection,” vol. 4, no. 4, pp. 1732–1738, 2023, doi: 10.47065/bits.v4i4.2944.

I. Prayoga and M. D. P, “Sentiment Analysis on Indonesian Movie Review Using KNN Method With the Implementation of Chi-Square Feature Selection,” vol. 7, pp. 369–375, 2023, doi: 10.30865/mib.v7i1.5522.

S. Al-Saqqa and A. Awajan, “The Use of Word2vec Model in Sentiment Analysis: A Survey,” ACM Int. Conf. Proceeding Ser., no. December, pp. 39–43, 2019, doi: 10.1145/3388218.3388229.

R. P. Nawangsari, R. Kusumaningrum, and A. Wibowo, “Word2vec for Indonesian sentiment analysis towards hotel reviews: An evaluation study,” Procedia Comput. Sci., vol. 157, pp. 360–366, 2019, doi: 10.1016/j.procs.2019.08.178.

T. Zhu, “Analysis on the applicability of the random forest,” J. Phys. Conf. Ser., vol. 1607, no. 1, 2020, doi: 10.1088/1742-6596/1607/1/012123.

F. Rahmad, Y. Suryanto, and K. Ramli, “Performance Comparison of Anti-Spam Technology Using Confusion Matrix Classification,” IOP Conf. Ser. Mater. Sci. Eng., vol. 879, no. 1, 2020, doi: 10.1088/1757-899X/879/1/012076.

F. Khairani, A. Kurnia, M. N. Aidi, and S. Pramana, “Predictions of Indonesia Economic Phenomena Based on Online News Using Random Forest,” SinkrOn, vol. 7, no. 2, pp. 532–540, 2022, doi: 10.33395/sinkron.v7i2.11401.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Sentiment Analysis using Random Forest and Word2Vec for Indonesian Language Movie Reviews

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 JURNAL MEDIA INFORMATIKA BUDIDARMA

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.



JURNAL MEDIA INFORMATIKA BUDIDARMA
STMIK Budi Darma
Secretariat: Sisingamangaraja No. 338 Telp 061-7875998
Email: mib.stmikbd@gmail.com

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.