Penerapan Support Vector Machine dan FastText untuk Mendeteksi Hate Speech dan Abusive pada Twitter
DOI:
https://doi.org/10.30865/mib.v7i1.5408Keywords:
Abusive Language, Hate Speech, Support Vector Machine, Fasttext, Text ClassificationAbstract
Hate speech and abusive language are negative tendencies that often appear on social media recently. In addition, due to the advancement of technology and the rapid growth of the internet, anyone can now engage in hate speech or even offensive expression such as in Twitter, which eventually leads to fights on that social media platforms. Automatic detection of offensive contents and hate speech is recommended to be applied, especially on the user application’s side, to filter tweet contents which destruct social life in the real world. The purpose of this research is to create a classification model using Support Vector Machine with FastText word embeddings features, to classify if a tweet contains hate speech and/or offensive language. Our contribution in this research is an improvement in performance from the baseline SVM (support vector machine) with FastText word embeddings features input. The experiment results will also be compared with several machine learning method that have been reported using the same dataset of 13,167 tweets. The experiment using the most optimal SVM model, yields an average accuracy of 82.65%, with the accuracies of the hate speech class, abusive language class and hate speech level, are 84.92%, 86.60% and 76.43% respectively. These results are better than conventional machine learning, but cannot exceed the results achieved by deep learning.
References
R. A. Kusuma, “Dampak Perkembangan Teknologi Informasi dan Komunikasi terhadap Perilaku Intoleransi dan Antisosial di Indonesia,†MAWA’IZH: JURNAL DAKWAH DAN PENGEMBANGAN SOSIAL KEMANUSIAAN, vol. 10, no. 2, hlm. 273–290, Des 2019, doi: 10.32923/maw.v10i2.932.
J. M. Teknologi dan S. Assegaff, “Evaluasi Pemanfaatan Media Sosial sebagai Sarana Knowledge Sharing,†Jurnal Manajemen Teknologi, vol. 16, no. 3, hlm. 271–293, 2017, doi: 10.12695/jmt.2017.16.3.4.
R. Bagus Triadi, “Penggunaan Makian Bahasa Indonesia pada Media Sosial (Kajian Sosiolinguisik),†Jurnal Sasindo Unpam, vol. 5, no. 2, Des 2017.
Luh Putu Ary Sri Tjahyanti, “Pendeteksian Bahasa Kasar (Abusive Language) dan Ujaran Kebencian (Hate Speech) dari Komentar di Jejaring Sosial,†Daiwi Widya : Jurnal Pendidikan, vol. 07, no. 01, 2020.
A. Kresna, B. A. Putra, M. A. Fauzi, B. D. Setiawan, dan E. Setiawati, “Identifikasi Ujaran Kebencian Pada Facebook Dengan Metode Ensemble Feature Dan Support Vector Machine,†Jurnal Pengembangan Teknologi Informasį dan Ilmu Komputer, vol. 2, no. 12, hlm. 2548–964, 2018, [Daring]. Available: http://j-ptiik.ub.ac.id
T. Ghassani Saskia, “Klasifikasi Hate Speech dan Abusive Language pada Twitter Bahasa Indonesia dengan Metode Naive Bayes Classifier,†Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, 2021.
A. Amri, “Implementasi Algoritma Random Forest untuk Mendeteksi Hate Speech dan Abusive Language pada Twitter Bahasa Indonesia,†Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, 2020.
F. Ihsan, I. Iskandar, N. S. Harahap, dan S. Agustian, “Decision tree algorithm for multi-label hate speech and abusive language detection in Indonesian Twitter,†Jurnal Teknologi dan Sistem Komputer, vol. 9, no. 4, hlm. 199–204, Okt 2021, doi: 10.14710/jtsiskom.2021.13907.
A. Fadilah, “Penerapan Algoritma K-Nearest Neighbor untuk Mendeteksi Ujaran Kebencian dan Bahasa Kasar pada Twitter Bahasa Indonesia,†Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, 2021.
F. Rahutomo, P. Yoga Saputra, dan M. Fidyawan Agtamas, “Implementasi Twitter Sentiment Analysis untuk Review Film Menggunakan Algoritma Support Vector Machine,†Jurnal Informatika Polinema , vol. 4, no. 2, hlm. 93–100, 2018.
P. Arsi dan R. Waluyo, “Analisis Sentimen Wacana Pemindahan Ibu Kota Indonesia Menggunakan Algoritma Support Vector Machine (SVM),†Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), vol. 8, no. 1, hlm. 147–156, 2021, doi: 10.25126/jtiik.202183944.
A. Nurdin, B. Anggo, S. Aji, A. Bustamin, dan Z. Abidin, “Perbandingan Kinerja Word Embedding Word2vec, Glove, dan Fasttext Pada Klasifikasi Teks,†Jurnal TEKNOKOMPAK, vol. 14, no. 2, hlm. 74, 2020.
M. Sahbuddin dan S. Agustian, “Support Vector Machine Method with Word2vec for Covid-19 Vaccine Sentiment Classification on Twitter,†JOURNAL OF INFORMATICS AND TELECOMMUNICATION ENGINEERING, vol. 6, no. 1, hlm. 288–297, Jul 2022, doi: 10.31289/jite.v6i1.7534.
M. O. Ibrohim dan I. Budi, “Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter,†Proceedings of the Third Workshop on Abusive Language Online, hlm. 46–57, 2019.
F. Astuti, R. M. Candra, S. Agustian, dan D. S. Ramadhani, “Klasifikasi Sentimen Masyarakat Terhadap Pemerintah Terkait Penerapan Kebijakan New Normal Menggunakan Metode K-Nearest Neighbor,†Jurnal Nasional Komputasi dan Teknologi Informasi, vol. 5, no. 3, 2022.
H. Najjichah, A. Syukur, dan H. Subagyo, “Pengaruh Text Preprocessing dan Kombinasinya pada Peringkas Dokumen Otomatis Teks Berbahasa Indonesia,†Jurnal Teknologi Informasi, vol. 15, no. 1, 2019, [Daring]. Available: http://research.
R. Tineges, A. Triayudi, dan I. D. Sholihati, “Analisis Sentimen Terhadap Layanan Indihome Berdasarkan Twitter Dengan Metode Klasifikasi Support Vector Machine (SVM),†JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 4, no. 3, hlm. 650, Jul 2020, doi: 10.30865/mib.v4i3.2181.
R. Achmad Rizal, I. Sanjaya Girsang, dan S. Apriyadi Prasetiyo, “Klasifikasi Wajah Menggunakan Support Vector Machine (SVM),†Riset dan E-Jurnal Manajemen Informatika Komputer, vol. 3, no. 2, 2019.
E. H. Harahap, L. Muflikhah, dan B. Rahayudi, “Implementasi Algoritma Support Vector Machine (SVM) Untuk Penentuan Seleksi Atlet Pencak Silat,†Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 2, no. 10, hlm. 3843–3848, 2018, [Daring]. Available: http://j-ptiik.ub.ac.id
F. Fauzi, M. Y. Darsyah, dan T. W. Utami, “Smooth Support Vector Machine (Ssvm) untuk Pengklasifikasian Indeks Pembangunan Manusia Kabupaten/Kota Se-Indonesia,†Statistika, vol. 5, no. 2, 2017.
Y. E. Ardiningtyas, P. Heruningsih, dan P. Rosa, “Analisis Balancing Data untuk Meningkatkan Akurasi Dalam Klasifikasi,†Prosiding Seminar Nasional Aplikasi Sains & Teknologi (SNAST) 2021 , vol. 2021, no. Prosiding SNAST 2021, hlm. A24–A28, Mar 2021.
D. Bertsimas, J. Dunn, C. Pawlowski, dan Y. D. Zhuo, “Robust Classification,†INFORMS Journal on Optimization, vol. 1, no. 1, hlm. 2–34, Jan 2019, doi: 10.1287/ijoo.2018.0001.
A. Fransiska, S. Agustian, dan Fitri Insani, dan M. Fikry, “Algoritme Logistic Regression untuk Mendeteksi Ujaran Kebencian dan Bahasa Kasar Multilabel pada Twitter Berbahasa Indonesia,†Jurnal Nasional Komputasi dan Teknologi Informasi, vol. 5, no. 4, 2022.
R. Saputra, “Implementasi Bidirectional Encoder Representations From Transformers (Bert) untuk Mendeteksi Hatespeech dan Abusive Language pada Twitter Bahasa Indonesia Jurusan Teknik Informatika Uin Suska Riau Tugas Akhir,†Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, 2022.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).