Explainable AI: Identification of Writing from Famous Figures in Indonesia Using BERT and Naive Bayes Methods
DOI:
https://doi.org/10.30865/mib.v7i1.5392Keywords:
BERT, Explainable AI, Figure, Indonesia, LIME, Naïve Bayes, WritingsAbstract
Identifying the writings of well-known figures in Indonesia is a form of appreciation for the writing itself. By knowing the language style used by every famous figure in Indonesia, we can know the uniqueness of each writer, and it can help us understand the thoughts, ideas, and ideas they convey. This research has yet to be done, so it is still interesting to do further research. In this study, only a few writers were used, so it is still impossible to know the overall language style used by every famous figure in Indonesia. In this study, a system was built to determine the language style used by well-known figures in Indonesia based on their writing using the BERT, Naïve Bayes, and LIME algorithms for explainable AI processes. The results are that the BERT algorithm is better at classifying text with an accuracy of 92% compared to Naïve Bayes, which has an accuracy of 90%. From this study, it was also found that KH. Abdurrahman Wahid and Emha Ainun Nadjib have almost the same style of language in which their writings contain many words with political and religious elements. Dahlan Iskan, his writing contains many words with political and socio-cultural elements, while Pramoedya Ananta Toer's writing uses many pronouns.References
M. Foucault, “Foucault - Author.Pdf,†Truth and Method, vol. 8. pp. 101–120, 1969.
I. The Center for Humanities, “Knowing the difference between facts and opinions,†Brgh. Manhattan Community Coll., no. 1977, p. 1977, 1977.
C. S. Lammer-Heindel, “Facts and Opinions,†no. September, pp. 10–12, 2016.
B. Publishing, “is Fiction ?,†vol. 43, no. 4, pp. 385–392, 2012.
A. Iyer and S. Vosoughi, “Style Change Detection Using BERT Notebook for PAN at CLEF 2020,†CEUR Workshop Proc., vol. 2696, no. September, pp. 22–25, 2020.
O. M. Aborisade and M. Anwar, “Classification for authorship of tweets by comparing logistic regression and naive bayes classifiers,†Proc. - 2018 IEEE 19th Int. Conf. Inf. Reuse Integr. Data Sci. IRI 2018, pp. 269–276, 2018, doi: 10.1109/IRI.2018.00049.
Q. Zheng, X. Tian, M. Yang, and H. Su, “The email author identification system based on Support Vector Machine (SVM) and Analytic Hierarchy Process (AHP),†IAENG Int. J. Comput. Sci., vol. 46, no. 2, pp. 178–191, 2019.
S. M. Mathews, Explainable Artificial Intelligence Applications in NLP, Biomedical, and Malware Classification: A Literature Review, vol. 998. Springer International Publishing, 2019.
N. Aslam, I. U. Khan, S. Mirza, A. Alowayed, and F. M. Anis, “Interpretable Machine Learning Models for Malicious Domains Detection Using Explainable Artificial Intelligence ( XAI ),†2022.
M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ Explaining the Predictions of Any Classifier,†NAACL-HLT 2016 - 2016 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. Proc. Demonstr. Sess., pp. 97–101, 2016, doi: 10.18653/v1/n16-3020.
H. Zhou, “Research of Text Classification Based on TF-IDF and CNN-LSTM,†J. Phys. Conf. Ser., vol. 2171, no. 1, pp. 218–222, 2022, doi: 10.1088/1742-6596/2171/1/012021.
L. M. R. Rizky, “Improving Stance-based Fake News Detection using BERT Model with Synonym Replacement and Random Swap Data Augmentation Technique,†2021.
V. F. Dr. Vladimir, “済無No Title No Title No Title,†Gastron. ecuatoriana y Tur. local., vol. 1, no. 69, pp. 5–24, 1967.
S. H. Myaeng, K. S. Han, and H. C. Rim, “Some effective techniques for naive bayes text classification,†IEEE Trans. Knowl. Data Eng., vol. 18, no. 11, pp. 1457–1466, 2006, doi: 10.1109/TKDE.2006.180.
R. M. Maertens, A. S. Long, and P. A. White, “Performance of the in vitro transgene mutation assay in MutaMouse FE1 cells: Evaluation of nine misleading (‘False’) positive chemicals,†Environ. Mol. Mutagen., vol. 58, no. 8, pp. 582–591, 2017, doi: 10.1002/em.22125.
S. Raschka, “Naive Bayes and Text Classification I - Introduction and Theory,†pp. 1–20, 2014.
M. Junker, R. Hoch, and A. Dengel, “On the evaluation of document analysis components by recall, precision, and accuracy,†Proc. Int. Conf. Doc. Anal. Recognition, ICDAR, pp. 717–720, 1999, doi: 10.1109/ICDAR.1999.791887.
S. Haghighi, M. Jasemi, S. Hessabi, and A. Zolanvari, “PyCM: Multiclass confusion matrix library in Python,†J. Open Source Softw., vol. 3, no. 25, p. 729, 2018, doi: 10.21105/joss.00729.
A. Saini and R. Prasad, Select Wisely and Explain: Active Learning and Probabilistic Local Post-hoc Explainability, vol. 1, no. 1. Association for Computing Machinery, 2022.
S. Mishra, B. L. Sturm, and S. Dixon, “Local interpretable model-agnostic explanations for music content analysis,†Proc. 18th Int. Soc. Music Inf. Retr. Conf. ISMIR 2017, pp. 537–543, 2017.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).