Sentiment Analysis of Hate Speech on Twitter Public Figures with AdaBoost and XGBoost Methods

Authors

  • Daffa Ulayya Suhendra Universitas Telkom, Bandung
  • Jondri Jondri Universitas Telkom, Bandung
  • Indwiarti Indwiarti Universitas Telkom, Bandung

DOI:

https://doi.org/10.30865/mib.v6i3.4394

Keywords:

Twitter, Hate Speech, Sentiment, Analysis, AdaBoost, XGBoost

Abstract

Public figures are often scrutinized by social media users, either because of what they say or even because of their role in a television series. Generally, public figures upload something on their social media accounts to help shape their image. But not everyone who sees it is happy. Some even dislike the upload. This study aims to determine public sentiment towards public figure Anya Geraldine conveyed on Twitter in Indonesian. The classification process in this study uses the Adaptive Boosting (AdaBoost) and Extreme Gradient Boosting (XGBoost) classification methods with text preprocessing using cleaning, case folding, tokenization, and filtering. The data used are tweets in Indonesian with the keyword â€@anyaselalubenarâ€, with a total dataset of 7,475 tweets divided into 6,887 positive and 588 negative tweets. From the label results using oversampling to avoid excessive overfitting problems. The feature used is TF-IDF weighting. Four experimental scenarios were carried out to validate the effectiveness of the model used: first model performance without oversampling, second model performance with oversampling, third model performance with undersampling, and fourth model performance with Hyperparameter tune. The experimental results show that XGBoost+SMOTE+Hyperparameter achieved 95% compared to AdaBoost+SMOTE+Hyperparameter of 87%. The application of SMOTE and Hyperparameter tune is proven to overcome the problem of data imbalance and get better classification results.

References

G. Buntoro, “ANALISIS SENTIMEN HATESPEECH PADA TWITTER DENGAN METODE NAÃVE BAYES CLASSIFIER DAN SUPPORT VECTOR MACHINE,†Jurnal Dinamika Informatika, vol. 5, Jun. 2016.

B. A. Simangunsong, “Interaksi Antarmanusia Melalui Media Sosial Facebook Mengenai Topik Keagamaan,†Jurnal Aspikom, vol. 3, no. 1, pp. 65–76, 2016.

S. Tuarob and J. L. Mitrpanont, “Automatic discovery of abusive thai language usages in social networks,†in International Conference on Asian Digital Libraries, 2017, pp. 267–278.

S. Surahman, “Public Figure sebagai Virtual Opinion Leader dan Kepercayaan Informasi Masyarakat,†WACANA: Jurnal Ilmiah Ilmu Komunikasi, vol. 17, no. 1, pp. 53–63, 2018.

F. S. Jumeilah and others, “Penerapan Support Vector Machine (SVM) untuk Pengkategorian Penelitian,†Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), vol. 1, no. 1, pp. 19–25, 2017.

R. R. Rerung, “Penerapan data mining dengan memanfaatkan metode association rule untuk promosi produk,†J. Teknol. Rekayasa, vol. 3, no. 1, p. 89, 2018.

G. Abdurrahman, “Klasifikasi Penyakit Diabetes Melitus Menggunakan Adaboost Classifier,†JUSTINDO (Jurnal Sistem dan Teknologi Informasi Indonesia), vol. 7, no. 1, pp. 59–66, 2022.

Z. Imaduddin and H. A. Tawakal, “Deteksi dan Klasifikasi Daun Menggunakan Metode Adaboost dan SVM,†SEMNASTEKNOMEDIA ONLINE, vol. 3, no. 1, 2015.

E. Sutoyo and M. A. Fadlurrahman, “Penerapan SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Television Advertisement Performance Rating Menggunakan Artificial Neural Network,†JEPIN (Jurnal Edukasi dan Penelitian Informatika), vol. 6, no. 3, pp. 379–385, 2020.

I. L. Cherif and A. Kortebi, “On using extreme gradient boosting (XGBoost) machine learning algorithm for home network traffic classification,†in 2019 Wireless Days (WD), 2019, pp. 1–6.

A. F. Hidayatullah, A. A. Fadila, K. P. Juwairi, and R. A. Nayoan, “Identifikasi Konten Kasar pada Tweet Bahasa Indonesia,†Jurnal Linguistik Komputasional, vol. 2, no. 1, pp. 1–5, 2019.

W. A. Luqyana, “Analisis Sentimen Cyberbullying pada Komentar Instagram dengan Metode Klasifikasi Support Vector Machine,†Universitas Brawijaya, 2018.

K. Nugroho, “INDONESIAN LANGUAGE CLASSIFICATION OF CYBERBULLING WORDS ON TWITTER USING ADABOOST AND NEURAL NETWORK METHODS,†Jurnal Riset Informatika, vol. 3, no. 2, pp. 93–100, 2021.

S. Liang and others, “Comparative Analysis of SVM, XGBoost and Neural Network on Hate Speech Classification,†Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), vol. 5, no. 5, pp. 896–903, 2021.

S. Sahrul, A. F. Rahman, M. D. Normansyah, and A. Irawan, “Sistem Pendeteksi Kalimat Umpatan Di Media Sosial Dengan Model Neural Network,†Computatio: Journal of Computer Science and Information Systems, vol. 3, no. 2, pp. 108–115, 2019.

N. Hidayah and S. Sahibu, “Algoritma Multinomial Naïve Bayes Untuk Klasifikasi Sentimen Pemerintah Terhadap Penanganan Covid-19 Menggunakan Data Twitter,†Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 4, pp. 820–826, 2021.

E. Listiana and M. A. Muslim, “Penerapan Adaboost Untuk Klasifikasi Support Vector Machine Guna Meningkatkan Akurasi Pada Diagnosa Chronic Kidney Disease,†Prosiding SNATIF, pp. 875–881, 2017.

J. Brownlee, XGBoost With python: Gradient boosted trees with XGBoost and scikit-learn. Machine Learning Mastery, 2016.

A. A. Firdaus and A. K. Mutaqin, “Klasifikasi Pemegang Polis Menggunakan Metode XGBoost,†Prosiding Statistika, pp. 704–710, 2021.

S. T. Jishan, R. I. Rashu, N. Haque, and R. M. Rahman, “Improving accuracy of students’ final grade prediction model using optimal equal width binning and synthetic minority over-sampling technique,†Decision Analytics, vol. 2, no. 1, pp. 1–25, 2015.

D. J. M. Pasaribu, K. Kusrini, and S. Sudarmawan, “Peningkatan Akurasi Klasifikasi Sentimen Ulasan Makanan Amazon dengan Bidirectional LSTM dan Bert Embedding,†Inspiration: Jurnal Teknologi Informasi dan Komunikasi, vol. 10, no. 1, pp. 9–20, 2020.

Downloads

Published

2022-07-25

Issue

Section

Articles