Explainable AI: Identification of Writing from Famous Figures in Indonesia Using BERT and Naive Bayes Methods

Authors

  • Firdaus Putra Kurniyanto Telkom University, Bandung
  • Agus Hartoyo Telkom University, Bandung

DOI:

https://doi.org/10.30865/mib.v7i1.5392

Keywords:

BERT, Explainable AI, Figure, Indonesia, LIME, Naïve Bayes, Writings

Abstract

Identifying the writings of well-known figures in Indonesia is a form of appreciation for the writing itself. By knowing the language style used by every famous figure in Indonesia, we can know the uniqueness of each writer, and it can help us understand the thoughts, ideas, and ideas they convey. This research has yet to be done, so it is still interesting to do further research. In this study, only a few writers were used, so it is still impossible to know the overall language style used by every famous figure in Indonesia. In this study, a system was built to determine the language style used by well-known figures in Indonesia based on their writing using the BERT, Naïve Bayes, and LIME algorithms for explainable AI processes. The results are that the BERT algorithm is better at classifying text with an accuracy of 92% compared to Naïve Bayes, which has an accuracy of 90%. From this study, it was also found that KH. Abdurrahman Wahid and Emha Ainun Nadjib have almost the same style of language in which their writings contain many words with political and religious elements. Dahlan Iskan, his writing contains many words with political and socio-cultural elements, while Pramoedya Ananta Toer's writing uses many pronouns.

References

M. Foucault, “Foucault - Author.Pdf,†Truth and Method, vol. 8. pp. 101–120, 1969.

I. The Center for Humanities, “Knowing the difference between facts and opinions,†Brgh. Manhattan Community Coll., no. 1977, p. 1977, 1977.

C. S. Lammer-Heindel, “Facts and Opinions,†no. September, pp. 10–12, 2016.

B. Publishing, “is Fiction ?,†vol. 43, no. 4, pp. 385–392, 2012.

A. Iyer and S. Vosoughi, “Style Change Detection Using BERT Notebook for PAN at CLEF 2020,†CEUR Workshop Proc., vol. 2696, no. September, pp. 22–25, 2020.

O. M. Aborisade and M. Anwar, “Classification for authorship of tweets by comparing logistic regression and naive bayes classifiers,†Proc. - 2018 IEEE 19th Int. Conf. Inf. Reuse Integr. Data Sci. IRI 2018, pp. 269–276, 2018, doi: 10.1109/IRI.2018.00049.

Q. Zheng, X. Tian, M. Yang, and H. Su, “The email author identification system based on Support Vector Machine (SVM) and Analytic Hierarchy Process (AHP),†IAENG Int. J. Comput. Sci., vol. 46, no. 2, pp. 178–191, 2019.

S. M. Mathews, Explainable Artificial Intelligence Applications in NLP, Biomedical, and Malware Classification: A Literature Review, vol. 998. Springer International Publishing, 2019.

N. Aslam, I. U. Khan, S. Mirza, A. Alowayed, and F. M. Anis, “Interpretable Machine Learning Models for Malicious Domains Detection Using Explainable Artificial Intelligence ( XAI ),†2022.

M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ Explaining the Predictions of Any Classifier,†NAACL-HLT 2016 - 2016 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. Proc. Demonstr. Sess., pp. 97–101, 2016, doi: 10.18653/v1/n16-3020.

H. Zhou, “Research of Text Classification Based on TF-IDF and CNN-LSTM,†J. Phys. Conf. Ser., vol. 2171, no. 1, pp. 218–222, 2022, doi: 10.1088/1742-6596/2171/1/012021.

L. M. R. Rizky, “Improving Stance-based Fake News Detection using BERT Model with Synonym Replacement and Random Swap Data Augmentation Technique,†2021.

V. F. Dr. Vladimir, “済無No Title No Title No Title,†Gastron. ecuatoriana y Tur. local., vol. 1, no. 69, pp. 5–24, 1967.

S. H. Myaeng, K. S. Han, and H. C. Rim, “Some effective techniques for naive bayes text classification,†IEEE Trans. Knowl. Data Eng., vol. 18, no. 11, pp. 1457–1466, 2006, doi: 10.1109/TKDE.2006.180.

R. M. Maertens, A. S. Long, and P. A. White, “Performance of the in vitro transgene mutation assay in MutaMouse FE1 cells: Evaluation of nine misleading (‘False’) positive chemicals,†Environ. Mol. Mutagen., vol. 58, no. 8, pp. 582–591, 2017, doi: 10.1002/em.22125.

S. Raschka, “Naive Bayes and Text Classification I - Introduction and Theory,†pp. 1–20, 2014.

M. Junker, R. Hoch, and A. Dengel, “On the evaluation of document analysis components by recall, precision, and accuracy,†Proc. Int. Conf. Doc. Anal. Recognition, ICDAR, pp. 717–720, 1999, doi: 10.1109/ICDAR.1999.791887.

S. Haghighi, M. Jasemi, S. Hessabi, and A. Zolanvari, “PyCM: Multiclass confusion matrix library in Python,†J. Open Source Softw., vol. 3, no. 25, p. 729, 2018, doi: 10.21105/joss.00729.

A. Saini and R. Prasad, Select Wisely and Explain: Active Learning and Probabilistic Local Post-hoc Explainability, vol. 1, no. 1. Association for Computing Machinery, 2022.

S. Mishra, B. L. Sturm, and S. Dixon, “Local interpretable model-agnostic explanations for music content analysis,†Proc. 18th Int. Soc. Music Inf. Retr. Conf. ISMIR 2017, pp. 537–543, 2017.

Downloads

Published

2023-01-28