Prediksi Kepribadian Big Five Pengguna Twitter Menggunakan Metode Decision Tree dengan Pendekatan Semantik BERT
DOI:
https://doi.org/10.30865/mib.v8i2.7311Keywords:
CART, Big Five Personality, LIWC, SMOTE, BERT, TwitterAbstract
Individual personality can be seen easily in this day. There are several approaches in classifying personality, one of which is the big five personality. The big five personality consists of 5 dimensions, namely Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. One way of knowing an individual's personality can be seen from their social media, because today almost all individuals have social media. One of the social media that is still widely used is Twitter. Twitter is a social media that contains tweets from each individual with a maximum of 280 characters per tweet. There have been several studies related to the big five personalities of Twitter users. Based on previous big five personality research problems, this study carried out predictions of the big five personalities of Twitter users using the Decision Tree Classification And Regression Tree (CART), Term Frequency – Inverse Document Frequency (TF-IDF), Synthetic Minority Oversampling Technique (SMOTE), Linguistic Inquiry Word Count (LIWC), and Bidirectional Encoder Representations from Transformers (BERT) methods. The study aims to determine the application of the methods used in this study to the prediction of big five personalities and to get better accuracy results than previous studies. Data obtained from 315 twitter users and 672,866 tweets obtained from surveys and have been labeled with big five personalities, resulting in an accuracy of 97.62% from the baseline with an increase of 23.1%, by applying the CART+TF-IDF+SMOTE+LIWC+BERT method.References
A. C. Rosito, “Eksplorasi Tipe Kepribadian Big Five Personality Traits Dan Pengaruhnya Terhadap Prestasi Akademik,” J. Psikol. Pendidik. dan Konseling J. Kaji. Psikol. Pendidik. dan Bimbing. Konseling, vol. 4, no. 1, p. 6, 2018, doi: 10.26858/jpkk.v4i1.3250.
R. Zenico, E. B. Setiawan, and F. N. Nugraha, “Prediksi Big Five Personality dengan Term Frequency Inverse Document Frequency ( TF – IDF ) Menggunakan Metode Logistic Regression pada Pengguna Twitter,” e-Proceeding Eng. Telkom Univ., vol. 6, no. 2, pp. 9939–9945, 2019.
E. Yuliani, E. Utami, and R. Suwanto, “Klasifikasi Kepribadian Pengguna Media Sosial,” J. Inf. Politek. Indonusa Surakarta, vol. 6, 2020.
N. P. Wong, F. N. S. Damanik, Christine, E. S. Jaya, and R. Rajaya, “Perbandingan Algoritma C4.5 dan Classification and Regression Tree (CART) Dalam Menyeleksi Calon Karyawan,” J. SIFO Mikroskil, vol. 20, no. 1, 2019, doi: 10.55601/jsm.v20i1.622.
K. D. W. Rahayu, R. Andreswari, and E. Sutoyo, “ANALISIS DAN DETEKSI FRAUD PADA DATA PANGGILAN MENGGUNAKAN ALGORITMA DECISION TREE PADA PT XYZ.”, 2020
R. Irmanita, S. S. Prasetiyowati, and Y. Sibaroni, “Classification of Malaria Complication Using CART (Classification and Regression Tree) and Naïve Bayes,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 1, pp. 10–16, 2021, doi: 10.29207/resti.v5i1.2770.
S. Qaiser and R. Ali, “Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents,” Int. J. Comput. Appl., vol. 181, no. 1, pp. 25–29, 2018, doi: 10.5120/ijca2018917395.
D. Elreedy and A. F. Atiya, “A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance,” Inf. Sci. (Ny)., vol. 505, pp. 32–64, 2019, doi: 10.1016/j.ins.2019.07.070.
G. D. Salsabila and E. B. Setiawan, “Semantic Approach for Big Five Personality Prediction on Twitter,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 4, pp. 680–687, 2021, doi: 10.29207/resti.v5i4.3197.
C. A. Putri, “Analisis Sentimen Review Film Berbahasa Inggris Dengan Pendekatan Bidirectional Encoder Representations from Transformers,” JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 6, no. 2, pp. 181–193, 2020, doi: 10.35957/jatisi.v6i2.206.
R. P. Pratama, “Prediksi Tipe Kepribadian Big Five Pada Pengguna Twitter Menggunakan Metode Random Forest,” 2021.
R. Ellandi, E. B. Setiawan, and F. N. Nugraha, “Prediksi kepribadian Big Five dengan Term-Frequency Inverse Document Frequency Menggunakan Metode k-Nearest Neighbor pada Twitter,” e-Proceeding Eng., vol. 6, no. 2, pp. 9955–9962, 2019.
N. D. SAPUTRA, “PENGGUNAAN METODE CLASSIFICATION AND REGRESSION TREE (CART) DALAM MENGKLASIFIKASIKAN PASIEN PENDERITA DBD DI RUMAH SAKIT ANWAR MAKKATUTU KABUPATEN BANTAENG,” 2021.
A. N. Bayhaqi and E. B. Setiawan, “Penilaian Big Five Personality pada Twitter Menggunakan Metode Logistic Regression dengan IndoBERT,” 2022.
R. Nashir, E. B. Setiawan, and D. Adytia, “The Influence of Sentiment on the Movement of Bank Mandiri (BMRI) Stock Price with Word2Vec Feature Expansion and the Naïve Bayes-Support Vector Machine (NBSVM) Classifier,” 2022.
S. Tangirala, “Evaluating the Impact of GINI Index and Information Gain on Classification using Decision Tree Classifier Algorithm,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 2, pp. 612–619, 2020, doi: 10.14569/ijacsa.2020.0110277.
S. Garg and A. Garg, “Comparison of machine learning algorithms for content based personality resolution of tweets,” Soc. Sci. Humanit. Open, vol. 4, no. 1, p. 100178, 2021, doi: 10.1016/j.ssaho.2021.100178.
F. I. Nur Haq and E. B. Setiawan, “Implementasi Naive Bayes Classifier untuk Prediksi Kepribadian Big Five pada Twitter Menggunakan Term Frequency-Inverse Document Frequency (TF-IDF) dan Term Frequency Relevance Frequency (TF-RF),” e-Proceeding Eng., vol. 6, no. 2, 2019.
G. Safitri and E. B. Setiawan, “Prediksi Kepribadian Pengguna Twitter dengan Support Vector Machine (SVM),” 2021.
B. A. C. Martani and E. B. Setiawan, “Naïve Bayes-Support Vector Machine Combined BERT to Classified Big Five Personality on Twitter,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 6, no. 6, pp. 1072–1078, 2022, doi: 10.29207/resti.v6i6.4378.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).