Tagging Efficiency Analysis of Part of Speech Taggers on Indonesian News

Djatnika Widia Nugraha; Donni Richasdy; Aditya Firman Ihsan

doi:10.30865/mib.v7i1.5384

Authors

Djatnika Widia Nugraha Telkom University, Bandung
Donni Richasdy Telkom University, Bandung
Aditya Firman Ihsan Telkom University, Bandung

DOI:

https://doi.org/10.30865/mib.v7i1.5384

Keywords:

Part of speech Tagging, Natural Process Language, Efficient, Conditional Random Fields, Hidden Markov Model

Abstract

Part of speech tagging (POS tagging) is a part of Natural Process Language (NLP). POS tagging is the process of automatic labeling of a word in a sentence according to the word class. There are various tagger methods in POS tagging, each tagger method has its own characteristics in its application. The research method used is Conditional Random Fields and Hidden Markov Model. The training of the two method models uses the Indonesian language corpus and Indonesian news texts as test data to determine which method is the most efficient based on the results of the accuracy and training time of each model. The method that has the best value is the CRF method with an accuracy value of 97.68 on the evaluation of the corpus test data and 90.02% for the sample Indonesian news dataset with a training time of 146.90 seconds, then there is the HMM method which has the highest accuracy value with a value of 94.25 % and shorter training time relatively shorter at 32.45 seconds and for the sample sentences containing 116 tokens, CRF method produces 90.05% accuracy which is higher than the HMM method which produces 79.31% accuracy.

References

R. Banga and P. Mehndiratta, â€œTagging Efficiency Analysis on Part of Speech Taggers,â€ Proc. - 2017 Int. Conf. Inf. Technol. ICIT 2017, pp. 264â€“267, 2018, doi: 10.1109/ICIT.2017.57.

A. Z. Amrullah, R. Hartanto, and I. W. Mustika, â€œA comparison of different part-of-speech tagging technique for text in Bahasa Indonesia,â€ Proc. - 2017 7th Int. Annu. Eng. Semin. Ina. 2017, 2017, doi: 10.1109/INAES.2017.8068538.

A. Zilziana, A. A. Suryani, and I. Asror, â€œPart of Speech Tagging Menggunakan Bahasa Jawa Dengan Metode Condition Random Fields,â€ e-Proceeding Eng., vol. 7, no. 2, pp. 8103â€“8111, 2020.

A. Chiche and B. Yitagesu, â€œPart of speech tagging: a systematic review of deep learning and machine learning approaches,â€ J. Big Data, vol. 9, no. 1, 2022, doi: 10.1186/s40537-022-00561-y.

D. Kun Indarta and A. Romadhony, â€œAspect and Opinion Extraction of Indonesian Lipsticks Product Reviews using Conditional Random Field (CRF),â€ KST 2021 - 2021 13th Int. Conf. Knowl. Smart Technol., pp. 113â€“117, 2021, doi: 10.1109/KST51265.2021.9415829.

A. S. Nasution et al., â€œSejarah Perkembangan Bahasa Indonesia,â€ J. Multidisiplin Dehasen, vol. 1, no. 3, pp. 197â€“202, 2022.

â€œKBBI.â€ https://kbbi.web.id/berita

F. Pisceldo, M. Adriani, and R. Manurung, â€œProbabilistic Part of Speech Tagging for Bahasa Indonesia,â€ Proc. 3rd Int. MALINDO Work. Coloca. event ACL-IJCNLP, no. May, 2009.

S. Briandoko, A. R. Dewi, and M. A. Setiawan, â€œPerbandingan Algoritma Conditional Random Field dan Hidden Markov Model pada Pos Tagging Bahasa Indonesia,â€ vol. 2, no. 2, 2018.

N. Sabloak, â€œPart-of-Speech (POS) Tagging Bahasa Indonesia Menggunakan Algoritma Viterbi,â€ no. x, pp. 1â€“11, 2016.

M. Kamayani, â€œPerkembangan Part-of-Speech Tagger Bahasa Indonesia,â€ J. Linguist. Komputasional, vol. 2, no. 2, p. 34, 2019, doi: 10.26418/jlk.v2i2.20.

K. Kurniawan and A. F. Aji, â€œToward a Standardized and More Accurate Indonesian Part-of-Speech Tagging,â€ Proc. 2018 Int. Conf. Asian Lang. Process. IALP 2018, pp. 303â€“307, 2019, doi: 10.1109/IALP.2018.8629236.

W. AlKhwiter and N. Al-Twairesh, â€œPart-of-speech tagging for Arabic tweets using CRF and Bi-LSTM,â€ Comput. Speech Lang., vol. 65, 2021, doi: 10.1016/j.csl.2020.101138.

M. Franzese and A. Iuliano, â€œHidden markov models,â€ Encycl. Bioinforma. Comput. Biol. ABC Bioinforma., vol. 1â€“3, pp. 753â€“762, 2018, doi: 10.1016/B978-0-12-809633-8.20488-3.

V. Krishnapriya, P. Sreesha, T. R. Harithalakshmi, T. C. Archana, and J. N. Vettath, â€œDesign of a POS tagger using conditional random fields for Malayalam,â€ 2014 1st Int. Conf. Comput. Syst. Commun. ICCSC 2014, no. December, pp. 370â€“373, 2003, doi: 10.1109/COMPSC.2014.7032680.

A. Mulyanto, Y. A. Nurhuda, and N. Wiyanto, â€œPenyelesaian Kata Ambigu Pada Proses POS Tagging Menggunakan Algoritma Hidden markov Model (HMM),â€ Pros. Semin. Nas. Metod. Kuantitatif, no. 978, pp. 347â€“358, 2017.

Muljono, U. Afini, and C. Supriyanto, â€œMorphology analysis for Hidden Markov Model based Indonesian part-of-speech tagger,â€ Proc. - 2017 1st Int. Conf. Informatics Comput. Sci. ICICoS 2017, vol. 2018-Janua, no. 0, pp. 237â€“240, 2017, doi: 10.1109/ICICOS.2017.8276368.

W. Khan et al., â€œPart of Speech Tagging in Urdu: Comparison of Machine and Deep Learning Approaches,â€ IEEE Access, vol. 7, pp. 38918â€“38936, 2019, doi: 10.1109/ACCESS.2019.2897327.

S. Fu, N. Lin, G. Zhu, and S. Jiang, â€œTowards Indonesian Part-of-Speech Tagging : Corpus and Models,â€ Proc. Lr. 2018 Work. Belt Road Lr., vol. 1, pp. 2â€“7, 2018, [Online]. Available: http://universaldependencies.org/

J. Xu, Y. Zhang, and D. Miao, â€œThree-way confusion matrix for classification: A measure driven view,â€ Inf. Sci. (Ny)., vol. 507, pp. 772â€“794, 2020, doi: 10.1016/j.ins.2019.06.064.

Sarang Narkhede, â€œConfunsion Matrix,â€ Towards Data Science. https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62

Tagging Efficiency Analysis of Part of Speech Taggers on Indonesian News

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License