Tagging Efficiency Analysis of Part of Speech Taggers on Indonesian News

Authors

  • Djatnika Widia Nugraha Telkom University, Bandung
  • Donni Richasdy Telkom University, Bandung
  • Aditya Firman Ihsan Telkom University, Bandung

DOI:

https://doi.org/10.30865/mib.v7i1.5384

Keywords:

Part of speech Tagging, Natural Process Language, Efficient, Conditional Random Fields, Hidden Markov Model

Abstract

Part of speech tagging (POS tagging) is a part of Natural Process Language (NLP). POS tagging is the process of automatic labeling of a word in a sentence according to the word class. There are various tagger methods in POS tagging, each tagger method has its own characteristics in its application. The research method used is Conditional Random Fields and Hidden Markov Model. The training of the two method models uses the Indonesian language corpus and Indonesian news texts as test data to determine which method is the most efficient based on the results of the accuracy and training time of each model. The method that has the best value is the CRF method with an accuracy value of 97.68 on the evaluation of the corpus test data and 90.02% for the sample Indonesian news dataset with a training time of 146.90 seconds, then there is the HMM method which has the highest accuracy value with a value of 94.25 % and shorter training time relatively shorter at 32.45 seconds and for the sample sentences containing 116 tokens, CRF method produces 90.05% accuracy which is higher than the HMM method which produces 79.31% accuracy.

References

R. Banga and P. Mehndiratta, “Tagging Efficiency Analysis on Part of Speech Taggers,†Proc. - 2017 Int. Conf. Inf. Technol. ICIT 2017, pp. 264–267, 2018, doi: 10.1109/ICIT.2017.57.

A. Z. Amrullah, R. Hartanto, and I. W. Mustika, “A comparison of different part-of-speech tagging technique for text in Bahasa Indonesia,†Proc. - 2017 7th Int. Annu. Eng. Semin. Ina. 2017, 2017, doi: 10.1109/INAES.2017.8068538.

A. Zilziana, A. A. Suryani, and I. Asror, “Part of Speech Tagging Menggunakan Bahasa Jawa Dengan Metode Condition Random Fields,†e-Proceeding Eng., vol. 7, no. 2, pp. 8103–8111, 2020.

A. Chiche and B. Yitagesu, “Part of speech tagging: a systematic review of deep learning and machine learning approaches,†J. Big Data, vol. 9, no. 1, 2022, doi: 10.1186/s40537-022-00561-y.

D. Kun Indarta and A. Romadhony, “Aspect and Opinion Extraction of Indonesian Lipsticks Product Reviews using Conditional Random Field (CRF),†KST 2021 - 2021 13th Int. Conf. Knowl. Smart Technol., pp. 113–117, 2021, doi: 10.1109/KST51265.2021.9415829.

A. S. Nasution et al., “Sejarah Perkembangan Bahasa Indonesia,†J. Multidisiplin Dehasen, vol. 1, no. 3, pp. 197–202, 2022.

“KBBI.†https://kbbi.web.id/berita

F. Pisceldo, M. Adriani, and R. Manurung, “Probabilistic Part of Speech Tagging for Bahasa Indonesia,†Proc. 3rd Int. MALINDO Work. Coloca. event ACL-IJCNLP, no. May, 2009.

S. Briandoko, A. R. Dewi, and M. A. Setiawan, “Perbandingan Algoritma Conditional Random Field dan Hidden Markov Model pada Pos Tagging Bahasa Indonesia,†vol. 2, no. 2, 2018.

N. Sabloak, “Part-of-Speech (POS) Tagging Bahasa Indonesia Menggunakan Algoritma Viterbi,†no. x, pp. 1–11, 2016.

M. Kamayani, “Perkembangan Part-of-Speech Tagger Bahasa Indonesia,†J. Linguist. Komputasional, vol. 2, no. 2, p. 34, 2019, doi: 10.26418/jlk.v2i2.20.

K. Kurniawan and A. F. Aji, “Toward a Standardized and More Accurate Indonesian Part-of-Speech Tagging,†Proc. 2018 Int. Conf. Asian Lang. Process. IALP 2018, pp. 303–307, 2019, doi: 10.1109/IALP.2018.8629236.

W. AlKhwiter and N. Al-Twairesh, “Part-of-speech tagging for Arabic tweets using CRF and Bi-LSTM,†Comput. Speech Lang., vol. 65, 2021, doi: 10.1016/j.csl.2020.101138.

M. Franzese and A. Iuliano, “Hidden markov models,†Encycl. Bioinforma. Comput. Biol. ABC Bioinforma., vol. 1–3, pp. 753–762, 2018, doi: 10.1016/B978-0-12-809633-8.20488-3.

V. Krishnapriya, P. Sreesha, T. R. Harithalakshmi, T. C. Archana, and J. N. Vettath, “Design of a POS tagger using conditional random fields for Malayalam,†2014 1st Int. Conf. Comput. Syst. Commun. ICCSC 2014, no. December, pp. 370–373, 2003, doi: 10.1109/COMPSC.2014.7032680.

A. Mulyanto, Y. A. Nurhuda, and N. Wiyanto, “Penyelesaian Kata Ambigu Pada Proses POS Tagging Menggunakan Algoritma Hidden markov Model (HMM),†Pros. Semin. Nas. Metod. Kuantitatif, no. 978, pp. 347–358, 2017.

Muljono, U. Afini, and C. Supriyanto, “Morphology analysis for Hidden Markov Model based Indonesian part-of-speech tagger,†Proc. - 2017 1st Int. Conf. Informatics Comput. Sci. ICICoS 2017, vol. 2018-Janua, no. 0, pp. 237–240, 2017, doi: 10.1109/ICICOS.2017.8276368.

W. Khan et al., “Part of Speech Tagging in Urdu: Comparison of Machine and Deep Learning Approaches,†IEEE Access, vol. 7, pp. 38918–38936, 2019, doi: 10.1109/ACCESS.2019.2897327.

S. Fu, N. Lin, G. Zhu, and S. Jiang, “Towards Indonesian Part-of-Speech Tagging : Corpus and Models,†Proc. Lr. 2018 Work. Belt Road Lr., vol. 1, pp. 2–7, 2018, [Online]. Available: http://universaldependencies.org/

J. Xu, Y. Zhang, and D. Miao, “Three-way confusion matrix for classification: A measure driven view,†Inf. Sci. (Ny)., vol. 507, pp. 772–794, 2020, doi: 10.1016/j.ins.2019.06.064.

Sarang Narkhede, “Confunsion Matrix,†Towards Data Science. https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62

Downloads

Published

2023-01-28