Prediksi Kepribadian Big Five Pengguna Twitter Menggunakan Metode Decision Tree dengan Pendekatan Semantik BERT

 (*)Jammie Reyhan Widyanto Mail (Telkom University, Bandung, Indonesia)
 Erwin Budi Setiawan (Telkom University, Bandung, Indonesia)

(*) Corresponding Author

Submitted: January 10, 2024; Published: April 30, 2024

Abstract

Individual personality can be seen easily in this day. There are several approaches in classifying personality, one of which is the big five personality. The big five personality consists of 5 dimensions, namely Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. One way of knowing an individual's personality can be seen from their social media, because today almost all individuals have social media. One of the social media that is still widely used is Twitter. Twitter is a social media that contains tweets from each individual with a maximum of 280 characters per tweet. There have been several studies related to the big five personalities of Twitter users. Based on previous big five personality research problems, this study carried out predictions of the big five personalities of Twitter users using the Decision Tree Classification And Regression Tree (CART), Term Frequency Inverse Document Frequency (TF-IDF), Synthetic Minority Oversampling Technique (SMOTE), Linguistic Inquiry Word Count (LIWC), and Bidirectional Encoder Representations from Transformers (BERT) methods. The study aims to determine the application of the methods used in this study to the prediction of big five personalities and to get better accuracy results than previous studies. Data obtained from 315 twitter users and 672,866 tweets obtained from surveys and have been labeled with big five personalities, resulting in an accuracy of 97.62% from the baseline with an increase of 23.1%, by applying the CART+TF-IDF+SMOTE+LIWC+BERT method.

Keywords


CART; Big Five Personality; LIWC; SMOTE; BERT; Twitter

Full Text:

PDF


Article Metrics

Abstract view : 96 times
PDF - 33 times

References

A. C. Rosito, Eksplorasi Tipe Kepribadian Big Five Personality Traits Dan Pengaruhnya Terhadap Prestasi Akademik, J. Psikol. Pendidik. dan Konseling J. Kaji. Psikol. Pendidik. dan Bimbing. Konseling, vol. 4, no. 1, p. 6, 2018, doi: 10.26858/jpkk.v4i1.3250.

R. Zenico, E. B. Setiawan, and F. N. Nugraha, Prediksi Big Five Personality dengan Term Frequency Inverse Document Frequency ( TF IDF ) Menggunakan Metode Logistic Regression pada Pengguna Twitter, e-Proceeding Eng. Telkom Univ., vol. 6, no. 2, pp. 99399945, 2019.

E. Yuliani, E. Utami, and R. Suwanto, Klasifikasi Kepribadian Pengguna Media Sosial, J. Inf. Politek. Indonusa Surakarta, vol. 6, 2020.

N. P. Wong, F. N. S. Damanik, Christine, E. S. Jaya, and R. Rajaya, Perbandingan Algoritma C4.5 dan Classification and Regression Tree (CART) Dalam Menyeleksi Calon Karyawan, J. SIFO Mikroskil, vol. 20, no. 1, 2019, doi: 10.55601/jsm.v20i1.622.

K. D. W. Rahayu, R. Andreswari, and E. Sutoyo, ANALISIS DAN DETEKSI FRAUD PADA DATA PANGGILAN MENGGUNAKAN ALGORITMA DECISION TREE PADA PT XYZ., 2020

R. Irmanita, S. S. Prasetiyowati, and Y. Sibaroni, Classification of Malaria Complication Using CART (Classification and Regression Tree) and Nave Bayes, J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 1, pp. 1016, 2021, doi: 10.29207/resti.v5i1.2770.

S. Qaiser and R. Ali, Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents, Int. J. Comput. Appl., vol. 181, no. 1, pp. 2529, 2018, doi: 10.5120/ijca2018917395.

D. Elreedy and A. F. Atiya, A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance, Inf. Sci. (Ny)., vol. 505, pp. 3264, 2019, doi: 10.1016/j.ins.2019.07.070.

G. D. Salsabila and E. B. Setiawan, Semantic Approach for Big Five Personality Prediction on Twitter, J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 4, pp. 680687, 2021, doi: 10.29207/resti.v5i4.3197.

C. A. Putri, Analisis Sentimen Review Film Berbahasa Inggris Dengan Pendekatan Bidirectional Encoder Representations from Transformers, JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 6, no. 2, pp. 181193, 2020, doi: 10.35957/jatisi.v6i2.206.

R. P. Pratama, Prediksi Tipe Kepribadian Big Five Pada Pengguna Twitter Menggunakan Metode Random Forest, 2021.

R. Ellandi, E. B. Setiawan, and F. N. Nugraha, Prediksi kepribadian Big Five dengan Term-Frequency Inverse Document Frequency Menggunakan Metode k-Nearest Neighbor pada Twitter, e-Proceeding Eng., vol. 6, no. 2, pp. 99559962, 2019.

N. D. SAPUTRA, PENGGUNAAN METODE CLASSIFICATION AND REGRESSION TREE (CART) DALAM MENGKLASIFIKASIKAN PASIEN PENDERITA DBD DI RUMAH SAKIT ANWAR MAKKATUTU KABUPATEN BANTAENG, 2021.

A. N. Bayhaqi and E. B. Setiawan, Penilaian Big Five Personality pada Twitter Menggunakan Metode Logistic Regression dengan IndoBERT, 2022.

R. Nashir, E. B. Setiawan, and D. Adytia, The Influence of Sentiment on the Movement of Bank Mandiri (BMRI) Stock Price with Word2Vec Feature Expansion and the Nave Bayes-Support Vector Machine (NBSVM) Classifier, 2022.

S. Tangirala, Evaluating the Impact of GINI Index and Information Gain on Classification using Decision Tree Classifier Algorithm, Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 2, pp. 612619, 2020, doi: 10.14569/ijacsa.2020.0110277.

S. Garg and A. Garg, Comparison of machine learning algorithms for content based personality resolution of tweets, Soc. Sci. Humanit. Open, vol. 4, no. 1, p. 100178, 2021, doi: 10.1016/j.ssaho.2021.100178.

F. I. Nur Haq and E. B. Setiawan, Implementasi Naive Bayes Classifier untuk Prediksi Kepribadian Big Five pada Twitter Menggunakan Term Frequency-Inverse Document Frequency (TF-IDF) dan Term Frequency Relevance Frequency (TF-RF), e-Proceeding Eng., vol. 6, no. 2, 2019.

G. Safitri and E. B. Setiawan, Prediksi Kepribadian Pengguna Twitter dengan Support Vector Machine (SVM), 2021.

B. A. C. Martani and E. B. Setiawan, Nave Bayes-Support Vector Machine Combined BERT to Classified Big Five Personality on Twitter, J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 6, no. 6, pp. 10721078, 2022, doi: 10.29207/resti.v6i6.4378.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Prediksi Kepribadian Big Five Pengguna Twitter Menggunakan Metode Decision Tree dengan Pendekatan Semantik BERT

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 JURNAL MEDIA INFORMATIKA BUDIDARMA

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.



JURNAL MEDIA INFORMATIKA BUDIDARMA
STMIK Budi Darma
Secretariat: Sisingamangaraja No. 338 Telp 061-7875998
Email: mib.stmikbd@gmail.com

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.