Personality Detection On Twitter User With RoBERTa
DOI:
https://doi.org/10.30865/mib.v7i1.5598Keywords:
Twitter, Personality Classification, Big Five Personality, RoBERTa, HyperparameterAbstract
Social media provides a service where users can make status updates about themselves. One of the social media that has such a facility is twitter. Twitter allows its users to express themselves easily by uploading tweets to their Twitter accounts. These activities on social media can indirectly describe the personality of the account owner. One form of personality classification that can be used is the big five personality. This theory classifies individual characters into five personality types, namely openness, conscientiousness, extraversion, agreeableness, and neuroticism. In the work environment, personality will significantly affect the work that is suitable for someone to do. To do a personality test, a test that is done manually, certainly takes longer and costs more. Therefore the use of machine learning to detect personality from social media is needed. By using the RoBERTa model to perform personality classification and dataset support from Twitter tweets, a system can be formed to detect personality. In the RoBERTa model, by determining the optimal ratio of training data and test data, as well as performing hyperparameter tuning, accuracy results can be obtained in classification activities, reaching 57.14%.
References
D. J. Holman and D. J. Hughes, “Transactions between Big-5 personality traits and job characteristics across 20 years,†J Occup Organ Psychol, vol. 94, no. 3, pp. 762–788, Sep. 2021, doi: 10.1111/joop.12332.
D. T. Alidemi and F. Fejza, “Theories Of Personality: A Literature Review,†International Journal of Progressive Sciences and Technologies (IJPSAT, vol. 25, no. 2, pp. 194–200, 2021, [Online]. Available: http://ijpsat.ijsht-journals.org
K. Simon, “DIGITAL 2022: INDONESIA,†Feb. 15, 2022. https://datareportal.com/reports/digital-2022-indonesia (accessed Jan. 22, 2023).
H. J. Kawekas, “Application of Social Media Twitter as a Strategy for Government’s Transparency: Study on #Kemala Jateng Program,†Forum Ilmu Sosial, vol. 47, no. 1, pp. 1–7, 2020, doi: 10.15294/fis.v47i1.23424.
N. Hutagalung, “Klasifikasi Tipe Kepribadian Pengguna Sosial Media Berdasarkan Teori BIG Five Menggunakan K-Nearest Neighbor,†Skripsi Sarjana, Universitas Sumatera Utara, Medan, 2018.
W. Bleidorn and C. James, “Using Machine Learning to Advance Personality Assessment and Theory,†Personality and Social Psychology Review, vol. 23, no. 2, pp. 190–203, 2019, doi: 10.1177/1088868318772990.
Md. T. Zumma, J. A. Munia, D. Halder, and Md. S. Rahman, “Personality Prediction from Twitter Dataset using Machine Learning,†in 2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2022, pp. 1–5. doi: 10.1109/ICCCNT54827.2022.9984495.
Y. Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,†CoRR, vol. abs/1907.11692, 2019, [Online]. Available: http://arxiv.org/abs/1907.11692
M. Hercog, P. Jaroński, J. Kolanowski, P. Mieczyński, D. Wiśniewski, and J. Potoniec, “Sarcastic RoBERTa: A RoBERTa-Based Deep Neural Network Detecting Sarcasm on Twitter,†in Big Data Analytics and Knowledge Discovery, 2022, pp. 46–52.
H. Jiang, X. Zhang, and J. D. Choi, “Automatic Text-based Personality Recognition on Monologues and Multiparty Dialogues Using Attentive Networks and Contextual Embeddings,†CoRR, vol. abs/1911.09304, 2019, [Online]. Available: http://arxiv.org/abs/1911.09304
H. Christian, D. Suhartono, A. Chowanda, and K. Z. Zamli, “Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging,†J Big Data, vol. 8, no. 1, p. 68, 2021, doi: 10.1186/s40537-021-00459-1.
D. Lu, “Masked Reasoner at SemEval-2020 Task 4: Fine-Tuning RoBERTa for Commonsense Reasoning,†in Proceedings of the Fourteenth Workshop on Semantic Evaluation, Dec. 2020, pp. 411–414. doi: 10.18653/v1/2020.semeval-1.49.
M. A. Ayub, K. Ahmad, K. Ahmad, N. Ahmad, and A. I. Al-Fuqaha, “NLP Techniques for Water Quality Analysis in Social Media Content,†CoRR, vol. abs/2112.11441, 2021, [Online]. Available: https://arxiv.org/abs/2112.11441
A. F. Adoma, N.-M. Henry, and W. Chen, “Comparative Analyses of Bert, Roberta, Distilbert, and Xlnet for Text-Based Emotion Recognition,†in 2020 17th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), 2020, pp. 117–121. doi: 10.1109/ICCWAMTIP51612.2020.9317379.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,†CoRR, vol. abs/1810.04805, 2018, [Online]. Available: http://arxiv.org/abs/1810.04805
D. Dhami, “Understanding BERT Word Embeddings,†Medium, Jul. 05, 2020. https://medium.com/@dhartidhami/understanding-bert-word-embeddings-7dc4d2ea54ca (accessed Jan. 22, 2023).
“flax-community/indonesian-roberta-base,†Huggingface.co, Dec. 02, 2022. https://huggingface.co/flax-community/indonesian-roberta-base (accessed Jan. 22, 2023).
M. Heydarian, T. E. Doyle, and R. Samavi, “MLCM: Multi-Label Confusion Matrix,†IEEE Access, vol. 10, pp. 19083–19095, 2022, doi: 10.1109/ACCESS.2022.3151048.
A. Luque, A. Carrasco, A. MartÃn, and A. de las Heras, “The impact of class imbalance in classification performance metrics based on the binary confusion matrix,†Pattern Recognit, vol. 91, pp. 216–231, 2019, doi: https://doi.org/10.1016/j.patcog.2019.02.023.
D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,†BMC Genomics, vol. 21, no. 1, p. 6, 2020, doi: 10.1186/s12864-019-6413-7.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).