Tagging Efficiency Analysis of Part of Speech Taggers on Indonesian News
DOI:
https://doi.org/10.30865/mib.v7i1.5384Keywords:
Part of speech Tagging, Natural Process Language, Efficient, Conditional Random Fields, Hidden Markov ModelAbstract
Part of speech tagging (POS tagging) is a part of Natural Process Language (NLP). POS tagging is the process of automatic labeling of a word in a sentence according to the word class. There are various tagger methods in POS tagging, each tagger method has its own characteristics in its application. The research method used is Conditional Random Fields and Hidden Markov Model. The training of the two method models uses the Indonesian language corpus and Indonesian news texts as test data to determine which method is the most efficient based on the results of the accuracy and training time of each model. The method that has the best value is the CRF method with an accuracy value of 97.68 on the evaluation of the corpus test data and 90.02% for the sample Indonesian news dataset with a training time of 146.90 seconds, then there is the HMM method which has the highest accuracy value with a value of 94.25 % and shorter training time relatively shorter at 32.45 seconds and for the sample sentences containing 116 tokens, CRF method produces 90.05% accuracy which is higher than the HMM method which produces 79.31% accuracy.References
R. Banga and P. Mehndiratta, “Tagging Efficiency Analysis on Part of Speech Taggers,†Proc. - 2017 Int. Conf. Inf. Technol. ICIT 2017, pp. 264–267, 2018, doi: 10.1109/ICIT.2017.57.
A. Z. Amrullah, R. Hartanto, and I. W. Mustika, “A comparison of different part-of-speech tagging technique for text in Bahasa Indonesia,†Proc. - 2017 7th Int. Annu. Eng. Semin. Ina. 2017, 2017, doi: 10.1109/INAES.2017.8068538.
A. Zilziana, A. A. Suryani, and I. Asror, “Part of Speech Tagging Menggunakan Bahasa Jawa Dengan Metode Condition Random Fields,†e-Proceeding Eng., vol. 7, no. 2, pp. 8103–8111, 2020.
A. Chiche and B. Yitagesu, “Part of speech tagging: a systematic review of deep learning and machine learning approaches,†J. Big Data, vol. 9, no. 1, 2022, doi: 10.1186/s40537-022-00561-y.
D. Kun Indarta and A. Romadhony, “Aspect and Opinion Extraction of Indonesian Lipsticks Product Reviews using Conditional Random Field (CRF),†KST 2021 - 2021 13th Int. Conf. Knowl. Smart Technol., pp. 113–117, 2021, doi: 10.1109/KST51265.2021.9415829.
A. S. Nasution et al., “Sejarah Perkembangan Bahasa Indonesia,†J. Multidisiplin Dehasen, vol. 1, no. 3, pp. 197–202, 2022.
“KBBI.†https://kbbi.web.id/berita
F. Pisceldo, M. Adriani, and R. Manurung, “Probabilistic Part of Speech Tagging for Bahasa Indonesia,†Proc. 3rd Int. MALINDO Work. Coloca. event ACL-IJCNLP, no. May, 2009.
S. Briandoko, A. R. Dewi, and M. A. Setiawan, “Perbandingan Algoritma Conditional Random Field dan Hidden Markov Model pada Pos Tagging Bahasa Indonesia,†vol. 2, no. 2, 2018.
N. Sabloak, “Part-of-Speech (POS) Tagging Bahasa Indonesia Menggunakan Algoritma Viterbi,†no. x, pp. 1–11, 2016.
M. Kamayani, “Perkembangan Part-of-Speech Tagger Bahasa Indonesia,†J. Linguist. Komputasional, vol. 2, no. 2, p. 34, 2019, doi: 10.26418/jlk.v2i2.20.
K. Kurniawan and A. F. Aji, “Toward a Standardized and More Accurate Indonesian Part-of-Speech Tagging,†Proc. 2018 Int. Conf. Asian Lang. Process. IALP 2018, pp. 303–307, 2019, doi: 10.1109/IALP.2018.8629236.
W. AlKhwiter and N. Al-Twairesh, “Part-of-speech tagging for Arabic tweets using CRF and Bi-LSTM,†Comput. Speech Lang., vol. 65, 2021, doi: 10.1016/j.csl.2020.101138.
M. Franzese and A. Iuliano, “Hidden markov models,†Encycl. Bioinforma. Comput. Biol. ABC Bioinforma., vol. 1–3, pp. 753–762, 2018, doi: 10.1016/B978-0-12-809633-8.20488-3.
V. Krishnapriya, P. Sreesha, T. R. Harithalakshmi, T. C. Archana, and J. N. Vettath, “Design of a POS tagger using conditional random fields for Malayalam,†2014 1st Int. Conf. Comput. Syst. Commun. ICCSC 2014, no. December, pp. 370–373, 2003, doi: 10.1109/COMPSC.2014.7032680.
A. Mulyanto, Y. A. Nurhuda, and N. Wiyanto, “Penyelesaian Kata Ambigu Pada Proses POS Tagging Menggunakan Algoritma Hidden markov Model (HMM),†Pros. Semin. Nas. Metod. Kuantitatif, no. 978, pp. 347–358, 2017.
Muljono, U. Afini, and C. Supriyanto, “Morphology analysis for Hidden Markov Model based Indonesian part-of-speech tagger,†Proc. - 2017 1st Int. Conf. Informatics Comput. Sci. ICICoS 2017, vol. 2018-Janua, no. 0, pp. 237–240, 2017, doi: 10.1109/ICICOS.2017.8276368.
W. Khan et al., “Part of Speech Tagging in Urdu: Comparison of Machine and Deep Learning Approaches,†IEEE Access, vol. 7, pp. 38918–38936, 2019, doi: 10.1109/ACCESS.2019.2897327.
S. Fu, N. Lin, G. Zhu, and S. Jiang, “Towards Indonesian Part-of-Speech Tagging : Corpus and Models,†Proc. Lr. 2018 Work. Belt Road Lr., vol. 1, pp. 2–7, 2018, [Online]. Available: http://universaldependencies.org/
J. Xu, Y. Zhang, and D. Miao, “Three-way confusion matrix for classification: A measure driven view,†Inf. Sci. (Ny)., vol. 507, pp. 772–794, 2020, doi: 10.1016/j.ins.2019.06.064.
Sarang Narkhede, “Confunsion Matrix,†Towards Data Science. https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).