Pengaruh Text Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19)
DOI:
https://doi.org/10.30865/mib.v5i2.2835Keywords:
COVID-19, Twitter, Sentiment Analysis, Preprocessing, Support Vector MachineAbstract
COVID-19 is a pandemic that is troubling many people. This has led to a lot of public comments on Twitter social media. The comments are used for sentiment analysis so that we know the polarity of the sentiment that appears, whether it is positive, negative, or neutral. The problem when using twitter data is that the tweet data still contains many non-standard words such as abbreviated writing due to the maximum limitation of characters that can be used in one tweet. Preprocessing is the most important initial stage in sentiment analysis when using Twitter data, because it affects the classification performance results. This study specifically discusses the preproceesing technique by performing several test scenarios for the combination of preprocessing techniques to determine which preprocessing technique produces the most optimal accuracy and its effect on sentiment analysis. Feature extraction using N-Gram and word weighting using TF-IDF. Mutual Information as a feature selection method. The classification method used is SVM because it is able to classify high-dimensional data according to the data used in this study, namely text data. The results of this study indicate that the best performance is obtained by using a combination of cleaning and stemming; and normalization of words, cleaning, and stemming with the same accuracy of 77.77%. the use of unigram results in higher accuracy compared to bigram. Mutual Information is able to reduce overfitting problems by reducing irrelevant features so that train and test accuracy is quite stableReferences
WHO. What is COVID-19?. who.int. https://www.who.int/news-room/q-a-detail/q-a-coronaviruses (accessed March, 28, 2020)
Situasi Terkini Perkembangan Coronavirus Disease (COVID-19) 18 Juni 2020. infeksiemerging.kemkes.go.id. kemkes.go.id. https://infeksiemerging.kemkes.go.id/situasi-infeksi-emerging/situasi-terkini-perkembangan-coronavirus-disease-covid-19-18-juni-2020 (accessed , June 18, 2020).
Rana, S, & Singh. A, Comparative Analysis of Sentiment Orientation Using SVM and Naïve Bayes techniques, 2016 2nd International Conference on Next Generation Computing Technologies, pp. 106-111, Oct. 2016.
Agastya, I. M. A. Pengaruh Stemmer Bahasa Indonesia terhadap Performa Analisis Sentimen Terjemahan Ulasan Film. Jurnal TEKNOKOMPAK, vol. 12, no. 1, pp. 18-23, Feb. 2018.
Nhlabano, V. V. & Lutu, P. E. N. (2018). Impact of Text Pre-processing on the Performance of Sentiment Analysis Models for Social Media Data. 2018 International Conference on Advances in Big Data, Computing and Data Communication Systems (icABCD), 2018, doi: 10.1109/ICABCD.2018.8465135.
L. G. Irham, A., Adiwijaya, and U. N. Wisesty, “Klasifikasi Berita Bahasa Indonesia Menggunakan Mutual Information dan Support Vector Machine,†J. Media Inform. Budidarma, vol. 3, no. 4, pp. 284–292, 2019.
Krouska. A, Troussas. C, and Virvou. M, “The effect of preprocessing techniques on Twitter Sentiment Analysis,†in 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA), 2016.
Junita, V. & Bachtiar, F. A. Klasifikasi Aktivitas Manusia menggunakan Algoritme Decision Tree C4.5 dan Information Gain untuk Seleksi Fitur. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 3, no. 10, pp. 9426-9433, Oct. 2019.
Nugroho, A. Analisis Sentimen Pada Media Sosial Twitter Menggunakan Naive Bayes Classifier Dengan Ekstrasi Fitur N-Gram. Jurnal Sains Komputer & Informatika (J-SAKTI), vol. 2, no. 2, pp. 200-209, Sep. 2018.
Putra. M. F, Anisa. H, & Diyas. P, Analisis Pengaruh Normalisasi, TF-IDF, Pemilihan Feature-set Terhadap Klasifikasi Sentimen Menggunakan Maximum Entropy (Studi Kasus : Grab dan Gojek), e-Proceeding of Engineering, vol. 6, no.2, pp. 8520-8529, Aug. 2019.
Hamzah. A. Deteksi Bahasa untuk Dokumen Teks Berbahasa Indonesia. Seminar Nasional Informatika (semnasIF 2010), pp. A5-A13, Mei. 2010.
Ahuja, R. et al. (2019). The Impact of Features Extraction on the Sentiment Analysis. International Conference on Pervasive Computing Advances and Applications, Procedia Computer Science, 2019, pp. 341-348.
Nurfikri, F. S., MS Mubarok. & adiwijaya. News Topic Classification Using Mutual Information and Bayesian Network. In 2018 6th International Conference on Information and Communication Technology (ICoICT), pp. 162-166. IEEE, 2018.
I. Mathilda Yulietha and S. Al Faraby. Klasifikasi Sentimen Review Film Menggunakan Algoritma Support Vector Machine,†e-Proceeding Eng., vol. 4, no. 3, pp. 4740–4750, 2017.
Adiwijaya, U. N. Wisesty, E. Lisnawati, A. Aditsania, D. S. Kusumo, "Dimensionality Reduction using Principal Component Analysis for Cancer Detection based on Microarray Data Classification", Journal of Computer Science vol.14, no.11, pp.1521-1530, Nov. 2018.
Cahyanti, F. E., Adiwijaya, & S. Al Faraby. On The Feature Extraction For Sentiment Analysis of Movie Reviews Based on SVM. 8th International Conference on Information and Communication Technology (ICoICT) ), Yogyakarta, Indonesia, Jun. 2020.
Said Al Farabym Eliza Riviera R. J, Andina Kusumaningrum dan Adiwijaya, “Classification of hadith into positive suggestion, negative suggestion, and information, IOP, 2018.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).