Hate Speech Hashtag Classification on Twitter Using the Hybrid Classifier Method

Aulia Rayhan Syaifullah; Yuliant Sibaroni

doi:10.30865/jurikom.v9i4.4548

Authors

Aulia Rayhan Syaifullah Telkom University, Bandung
Yuliant Sibaroni Telkom University, Bandung

DOI:

https://doi.org/10.30865/jurikom.v9i4.4548

Keywords:

Twitter, Hate Speech, Hybrid Classifier, Classification, Social Media

Abstract

Hate speech on social media, especially Twitter, often takes the form of racism, sexism, or political interests aimed at certain individuals or groups. These actions can trigger crime, riots, violence and even resistance to individuals or groups. Therefore, we need a process of classifying a tweet whether it is hate speech or not to reduce the abuse that occurs on Twitter. The technology used in the classification of hate speech that is most commonly used is neural networks that require user data and meta data. In previous studies, the NaÃ¯ve Bayes (NB) method has been used using the bigram, unigram and feature selection features with an accuracy of 80-85%. The k-Nearest Neighbor (kNN) method has also been used which has an accuracy of 70-85% on the clarification of hate speech by political figures. Meanwhile, the most widely used method is the Support Vector Machine (SVM) method with an accuracy of 70 to the highest 95%. To get a higher accuracy in the classification of hate speech, this study will perform a Hybrid Classifier on the Hate Speech Hashtag Classification process using a combination method of MLP, kNN, NB. The data used in this study are Twitter Tweets from November 2021 to June 2022 regarding trending hashtags. The average accuracy performance results obtained using MLP, kNN, NB were 72%, 63%,73% respectively. To improve the accuracy of the classification results of the three methods, a combination of methods using the Hybrid Classifier is carried out. Experimental results show Hybrid Classifier with voting method can increase accuracy up to 74%. It was found that the use of a hybrid can provide a better system performance than the 3 classifiers in its composition, namely kNN, NB and MLP

Â

References

C. M. Annur, â€œAda 204,7 Juta Pengguna Internet di Indonesia Awal 2022 [online],â€ 2022. https://databoks.katadata.co.id/datapublish/2022/03/23/ada-2047-juta-pengguna-internet-di-indonesia-awal-2022 (accessed Mar. 23, 2022).

T. Febriana and A. Budiarto, â€œTwitter Dataset for Hate Speech and Cyberbullying Detection in Indonesian Language,â€ in 2019 International Conference on Information Management and Technology (ICIMTech), Aug. 2019, vol. 1, pp. 379â€“382. doi: 10.1109/ICIMTech.2019.8843722.

M. A. Fauzi and A. Yuniarti, â€œEnsemble method for indonesian twitter hate speech detection,â€ Indonesian Journal of Electrical Engineering and Computer Science, vol. 11, no. 1, pp. 294â€“299, 2018, doi: 10.11591/ijeecs.v11.i1.pp294-299.

M. Hakiem, M. A. Fauzi, and I. Indriati, â€œKlasifikasi ujaran kebencian pada twitter menggunakan metode naÃ¯ve bayes berbasis N-gram dengan seleksi fitur information gain,â€ vol, vol. 3, no. 3, pp. 2443â€“2451, 2019.

H. Sahi, Y. Kilic, and R. B. Saglam, â€œAutomated Detection of Hate Speech towards Woman on Twitter,â€ in 2018 3rd International Conference on Computer Science and Engineering (UBMK), Sep. 2018, pp. 533â€“536. doi: 10.1109/UBMK.2018.8566304.

G. K. Pitsilis, H. Ramampiaro, and H. Langseth, â€œEffective hate-speech detection in Twitter data using recurrent neural networks,â€ Applied Intelligence, vol. 48, no. 12, pp. 4730â€“4742, Dec. 2018, doi: 10.1007/s10489-018-1242-y.

K. K. Kiilu, G. Okeyo, R. Rimiru, and K. Ogada, â€œUsing NaÃ¯ve Bayes Algorithm in detection of Hate Tweets,â€ International Journal of Scientific and Research Publications (IJSRP), vol. 8, no. 3, pp. 99â€“107, Mar. 2018, doi: 10.29322/IJSRP.8.3.2018.p7517.

S. Saha, J. Yadav, and P. Ranjan, â€œProposed Approach for Sarcasm Detection in Twitter,â€ Indian Journal of Science and Technology, vol. 10, no. 25, pp. 1â€“8, Jun. 2017, doi: 10.17485/ijst/2017/v10i25/114443.

Oryza Habibie Rahman, Gunawan Abdillah, and Agus Komarudin, â€œKlasifikasi Ujaran Kebencian pada Media Sosial Twitter Menggunakan Support Vector Machine,â€ Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 1, pp. 17â€“23, Feb. 2021, doi: 10.29207/resti.v5i1.2700.

I. Vogel and M. Meghana, â€œProfiling Hate Speech Spreaders on Twitter: SVM vs. Bi-LSTM.,â€ in CLEF (Working Notes), 2021, pp. 2193â€“2200.

J. C. Pereira-Kohatsu, L. Quijano-SÃ¡nchez, F. Liberatore, and M. Camacho-Collados, â€œDetecting and Monitoring Hate Speech in Twitter,â€ Sensors (Basel), vol. 19, no. 21, p. 4654, Oct. 2019, doi: 10.3390/s19214654.

A. Silva and N. Roman, Hate Speech Detection in Portuguese with NaÃ¯ve Bayes, SVM,MLP and Logistic Regression. 2020. doi: 10.5753/eniac.2020.12112.

J. KUSUMA, B. H. HAYADI, W. WANAYUMINI, and R. ROSNELLY, â€œKomparasi Metode Multi Layer Perceptron (MLP) dan Support Vector Machine (SVM) untuk Klasifikasi Kanker Payudara,â€ MIND Journal, vol. 7, no. 1, pp. 51â€“60, Jun. 2022, doi: 10.26760/mindjournal.v7i1.51-60.

I. Kamalludin and B. N. Arief, â€œKEBIJAKAN FORMULASI HUKUM PIDANA TENTANG PENANGGULANGAN TINDAK PIDANA PENYEBARAN UJARAN KEBENCIAN (HATE SPEECH) DI DUNIA MAYA,â€ LAW REFORM, vol. 15, no. 1, p. 113, May 2019, doi: 10.14710/lr.v15i1.23358.

Robi Kurniawan and Aulia Apriliani, â€œAnalisis Sentimen Masyarakat terhadap Virus Corona berdasarkan Opini dari Twitter berbasis Web Scraper,â€ Jurnal Instek, vol. 5, no. 1, 2020.

S. Gharatkar, A. Ingle, T. Naik, and A. Save, â€œReview preprocessing using data cleaning and stemming technique,â€ in 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), Mar. 2017, pp. 1â€“4. doi: 10.1109/ICIIECS.2017.8276011.

T. Ridwansyah, â€œImplementasi Text Mining Terhadap Analisis Sentimen Masyarakat Dunia Di Twitter Terhadap Kota Medan Menggunakan K-Fold Cross Validation Dan NaÃ¯ve Bayes Classifier,â€ KLIK: Kajian Ilmiah Informatika dan Komputer, vol. 2, no. 5, pp. 178â€“185, Apr. 2022, doi: 10.30865/klik.v2i5.362.

Fatri Nurul Inayah, Sri Suryani Prasetiyowati, and Yuliant Sibaroni, â€œClassification of Dengue Hemorrhagic Fever (DHF) Spread in Bandung using Hybrid NaÃ¯ve Bayes, K-Nearest Neighbor, and Artificial Neural Network Methods,â€ International Journal on Information and Communication Technology (IJoICT), vol. 7, no. 1, pp. 10â€“20, Jun. 2021, doi: 10.21108/ijoict.v7i1.562.

Hate Speech Hashtag Classification on Twitter Using the Hybrid Classifier Method

Authors

DOI:

Keywords:

Abstract

References

Additional Files

Published

How to Cite

Issue

Section

menujuribaru

template

sitasigs

member

Keywords