Big Five Personality Detection Based on Social Media Using Pre-Trained IndoBERT Model and Gaussian Naive Bayes

Authors

  • Ni Made Dwipadini Puspitarini Telkom University, Bandung
  • Yuliant Sibaroni Telkom University, Bandung
  • Sri Suryani Prasetiyowati Telkom University, Bandung

DOI:

https://doi.org/10.30865/mib.v7i1.5439

Keywords:

Big Five Personality, Combined Model, Gaussian Naive Bayes, IndoBERT, Log Probability Value, Personality Detection

Abstract

A person's personality offers a thorough understanding of them and has a significant role in how well they perform at work in the future. No wonder it attracted the interest of the researcher to develop a personality detection system. Although much research about personality detection through social media was conducted, this task has been challenging to implement, especially using conventional machine learning. The issue is conventional machine learning still insufficient to make the personality detection system perform better. The purpose of this research is to detect Big Five personalities based on Indonesian tweets and increase its performance by combining machine learning with deep learning, which is Gaussian Naive Bayes and IndoBERT model. The proposed combined model in this research is summing the log probability vector on each model. Gathered 3.342 tweets from 111 Twitter accounts that were used as a dataset. This research also implemented min-max normalization to rescale the data. The result showed that for the entire dataset, the combined model has more accuracy score than Gaussian Naive Bayes by 5.42% and IndoBERT by almost 2%, which indicates the combined model is better than the Gaussian Naive Bayes and IndoBERT models.

References

N. Aiyuda and N. A. Syakarofath, “Presentasi diri di sosial media (Instagram dan Facebook),†PSYCHOPOLYTAN (Jurnal Psikologi), vol. 2, 2019, Accessed: Dec. 19, 2022. [Online]. Available: http://jurnal.univrab.ac.id/index.php/psi/article/view/915

M. M. Tadesse, H. Lin, B. Xu, and L. Yang, “Personality Predictions Based on User Behavior on the Facebook Social Media Platform,†IEEE Access, vol. 6, pp. 61959–61969, 2018, doi: 10.1109/ACCESS.2018.2876502.

T. Tony and Y. Taufik, “Pengaruh Kepribadian dan Pengalaman Kerja terhadap Kompetensi Kerja Karyawan PT. Era Musika Indah Medan,†Lensa Ilmiah: Jurnal Manajemen dan Sumberdaya, vol. 1, no. 1, pp. 88–93, Aug. 2022, doi: 10.54371/jms.v1i1.187.

P. S. Dandannavar, S. R. Mangalwede, and P. M. Kulkarni, “Social Media Text - A Source for Personality Prediction,†in 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS), Dec. 2018, pp. 62–65. doi: 10.1109/CTEMS.2018.8769304.

M. Villeda and R. McCamey, “Use of Social Networking Sites for Recruiting and Selecting in the Hiring Process,†International Business Research, vol. 12, no. 3, p. 66, Feb. 2019, doi: 10.5539/ibr.v12n3p66.

M. Ichsanudin, A. S. Y. Irawan, and A. Solehudin, “Prediksi Kepribadian Berdasarkan Media Sosial Twitter Menggunakan Metode Naive Bayes Classifier,†Jurnal Sains Komputer & Informatika (J-SAKTI), vol. 5, no. 2, pp. 988–996, 2021, doi: http://dx.doi.org/10.30645/j-sakti.v5i2.394.

Y. B. N. D. Artissa, I. Asror, and S. A. Faraby, “Personality Classification based on Facebook status text using Multinomial Naive Bayes method,†J Phys Conf Ser, vol. 1192, no. 1, p. 012003, Mar. 2019, doi: 10.1088/1742-6596/1192/1/012003.

Yusra, M. Fikry, R. Syarfianto, R. Mai Candra, and E. Budianita, “Klasifikasi Kepribadian Big Five Pengguna Twitter dengan Metode Naive Bayes,†Seminar Nasional Teknologi Informasi, Komunikasi dan Industri (SNTIKI-10), no. November, pp. 2579–5406, 2018.

A. v. Kunte and S. Panicker, “Using textual data for Personality Prediction:A Machine Learning Approach,†in 2019 4th International Conference on Information Systems and Computer Networks (ISCON), Nov. 2019, pp. 529–533. doi: 10.1109/ISCON47742.2019.9036220.

B. Singh and S. Singhal, “Automated Personality Classification Using Data Mining Techniques,†SSRN Electronic Journal, 2020, doi: 10.2139/ssrn.3602540.

H. Ahmad, M. U. Asghar, M. Z. Asghar, A. Khan, and A. H. Mosavi, “A Hybrid Deep Learning Technique for Personality Trait Classification From Text,†IEEE Access, vol. 9, pp. 146214–146232, 2021, doi: 10.1109/ACCESS.2021.3121791.

G. D. Salsabila and E. B. Setiawan, “Semantic Approach for Big Five Personality Prediction on Twitter,†Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 4, pp. 680–687, Aug. 2021, doi: 10.29207/resti.v5i4.3197.

I.-A. Albu and S. Spînu, “Emotion Detection From Tweets Using a BERT and SVM Ensemble Model,†U.P.B. Sci. Bull., Series C, vol. 84, no. 1, p. 2022, Aug. 2022, [Online]. Available: http://arxiv.org/abs/2208.04547

S. Liu, H. Tao, and S. Feng, “Text Classification Research Based on Bert Model and Bayesian Network,†in 2019 Chinese Automation Congress (CAC), Nov. 2019, pp. 5842–5846. doi: 10.1109/CAC48633.2019.8996183.

H. Ning, S. Dhelim, and N. Aung, “PersoNet: Friend Recommendation System Based on Big-Five Personality Traits and Hybrid Filtering,†IEEE Trans Comput Soc Syst, vol. 6, no. 3, pp. 394–402, Jun. 2019, doi: 10.1109/TCSS.2019.2903857.

W. Maharani and V. Effendy, “Big five personality prediction based in Indonesian tweets using machine learning methods,†International Journal of Electrical and Computer Engineering (IJECE), vol. 12, no. 2, p. 1973, Apr. 2022, doi: 10.11591/ijece.v12i2.pp1973-1981.

R. R. R. Arisandi, B. Warsito, and A. R. Hakim, “APLIKASI NAIVE BAYES CLASSIFIER (NBC) PADA KLASIFIKASI STATUS GIZI BALITA STUNTING DENGAN PENGUJIAN K-FOLD CROSS VALIDATION,†Jurnal Gaussian, vol. 11, no. 1, pp. 130–139, May 2022, doi: 10.14710/j.gauss.v11i1.33991.

S. Prusty, S. Patnaik, and S. K. Dash, “SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer,†Frontiers in Nanotechnology, vol. 4, Aug. 2022, doi: 10.3389/fnano.2022.972421.

F. K. Khattak, S. Jeblee, C. Pou-Prom, M. Abdalla, C. Meaney, and F. Rudzicz, “A survey of word embeddings for clinical text.,†J Biomed Inform, vol. 100S, no. October, p. 100057, 2019, doi: 10.1016/j.yjbinx.2019.100057.

R. Y. Rumagit and A. S. Girsang, “Predicting personality traits of facebook users using text mining,†J Theor Appl Inf Technol, vol. 96, no. 20, pp. 6877–6888, 2018.

M. B. Ressan and R. F. Hassan, “Naive-Bayes family for sentiment analysis during COVID-19 pandemic and classification tweets,†Indonesian Journal of Electrical Engineering and Computer Science, vol. 28, no. 1, p. 375, Oct. 2022, doi: 10.11591/ijeecs.v28.i1.pp375-383.

O. I. Gifari, Muh. Adha, F. Freddy, and F. F. S. Durrand, “Analisis Sentimen Review Film Menggunakan TF-IDF dan Support Vector Machine,†Journal of Information Technology, vol. 2, no. 1, pp. 36–40, Mar. 2022, doi: 10.46229/jifotech.v2i1.330.

D. Borkin, A. Némethová, G. MichaľÄonok, and K. Maiorov, “Impact of Data Normalization on Classification Model Accuracy,†Research Papers Faculty of Materials Science and Technology Slovak University of Technology, vol. 27, no. 45, pp. 79–84, Sep. 2019, doi: 10.2478/rput-2019-0029.

Downloads

Published

2023-01-28