Part-of-Speech Tagging Implementation on Telkom University News using Bidirectional LSTM Method

Authors

  • Rheza Ramadhan Putra Telkom University, Bandung
  • Donni Richasdy Telkom University, Bandung
  • Aditya Firman Ihsan Telkom University, Bandung

DOI:

https://doi.org/10.30865/mib.v7i1.5506

Keywords:

Pos Tagging, Bidirectional LSTM, Indonesian, News

Abstract

News is a tool used to disseminate information through various media, one of which is the internet. Various kinds of news articles have words that are not recognized in the dictionary such as slang words and have foreign words that do not exist in the corpus. How can a POS tagging model built on the corpus be able to handle word class labeling in Indonesian news. The research was conducted to check the results of POS tagging on a collection of news about Telkom University which was selected manually. By using the bidirectional LSTM model, three test scenarios were attempted to improve the performance of the built model, the test scenarios were applying the best padding for the corpus, comparing the performance results of the modified corpus model with the original corpus model, and determining the dimensions of the Word2vec vector. Then the selected model from each corpus is implemented on the news that has been labeled manually. One of the best scenario tests is obtained by modifying the corpus by removing double words in the word class "X" and changing some of the word classes "X" which are more likely to be foreign words so that they are changed to the word class "FW". The best performance results in the implementation of news about Telkom University using the bidirectional LSTM model which was built based on the modified corpus get accuracy values of 92.74%, precision of 92.85%, recall of 92.74%, and F1-score 92.48%.

References

D. R. Bulan, “Bahasa Indonesia sebagai Identitas Nasional Bangsa Indonesia,†J. JISIPOL, vol. 3, no. 2, pp. 23–29, 2019, [Online]. Available: http://ejournal.unibba.ac.id/index.php/jisipol/article/view/115

R. S. Yuwana, A. R. Yuliani, and H. F. Pardede, “On part of speech tagger for Indonesian language,†Proc. - 2017 2nd Int. Conf. Inf. Technol. Inf. Syst. Electr. Eng. ICITISEE 2017, vol. 2018-Janua, pp. 369–372, 2018, doi: 10.1109/ICITISEE.2017.8285530.

M. I. Suri and A. S. Puspaningrum, “Sistem Informasi Manajemen Berita Berbasis Web,†J. Teknol. dan Sist. Inf., vol. 1, no. 1, pp. 8–14, 2020, [Online]. Available: http://jim.teknokrat.ac.id/index.php/sisteminformasi

S. Okura, “Embedding-based News Recommendation Topics module First view,†pp. 1933–1942, 2017.

A. Z. Amrullah, R. Hartanto, and I. W. Mustika, “A comparison of different part-of-speech tagging technique for text in Bahasa Indonesia,†Proc. - 2017 7th Int. Annu. Eng. Semin. Ina. 2017, 2017, doi: 10.1109/INAES.2017.8068538.

D. Munandar, E. Suryawati, D. Riswantini, A. F. Abka, R. Wijayanti, and A. Arisal, “POS-tagging for non-english tweets: An automatic approach: (Study in Bahasa Indonesia),†Proc. - 2017 1st Int. Conf. Informatics Comput. Sci. ICICoS 2017, vol. 2018-Janua, pp. 219–224, 2017, doi: 10.1109/ICICOS.2017.8276365.

D. E. Cahyani and M. J. Vindiyanto, “Indonesian part of speech tagging using hidden markov model - Ngram viterbi,†2019 4th Int. Conf. Inf. Technol. Inf. Syst. Electr. Eng. ICITISEE 2019, pp. 353–358, 2019, doi: 10.1109/ICITISEE48480.2019.9003989.

D. Q. Nguyen and K. Verspoor, “An improved neural network model for joint POS tagging and dependency parsing,†CoNLL 2018 - SIGNLL Conf. Comput. Nat. Lang. Learn. Proc. CoNLL 2018 Shar. Task Multiling. Parsing from Raw Text to Univers. Depend., pp. 81–91, 2018, doi: 10.18653/v1/K18-2008.

K. Kurniawan and A. F. Aji, “Toward a Standardized and More Accurate Indonesian Part-of-Speech Tagging,†Proc. 2018 Int. Conf. Asian Lang. Process. IALP 2018, pp. 303–307, 2019, doi: 10.1109/IALP.2018.8629236.

M. Maimaiti, A. Wumaier, K. Abiderexiti, and T. Yibulayin, “Bidirectional long short-term memory network with a conditional random field layer for Uyghur part-of-speech tagging,†Inf., vol. 8, no. 4, 2017, doi: 10.3390/info8040157.

D. Handrata, C. N. Purwanto, F. H. Chandra, J. Santoso, and Gunawan, “Part of Speech Tagging for Indonesian Language using Bidirectional Long Short-Term Memory,†2019 1st Int. Conf. Cybern. Intell. Syst. ICORIS 2019, vol. 1, no. August, pp. 85–88, 2019, doi: 10.1109/ICORIS.2019.8874871.

K. K. Purnamasari and I. S. Suwardi, “Rule-based Part of Speech Tagger for Indonesian Language,†IOP Conf. Ser. Mater. Sci. Eng., vol. 407, no. 1, 2018, doi: 10.1088/1757-899X/407/1/012151.

M. Mursyit, A. P. Wibawa, I. A. E. Zaeni, and H. A. Rosyid, “Pelabelan Kelas Kata Bahasa Jawa Menggunakan Hidden Markov Model,†Mob. Forensics, vol. 2, no. 2, pp. 71–83, 2020, doi: 10.12928/mf.v2i2.2450.

I. G. M. H. Pradiptha and N. A. Sanjaya ER, “Building Balinese Part-of-Speech Tagger Using Hidden Markov Model (HMM),†JELIKU (Jurnal Elektron. Ilmu Komput. Udayana), vol. 9, no. 2, p. 303, 2020, doi: 10.24843/jlk.2020.v09.i02.p18.

N. Schmitt and M. P. H. Rodgers, An Introduction to Applied Linguistics: Third edition. Routledge, 2018. doi: 10.4324/9780429424465.

B. Wang, A. Wang, F. Chen, Y. Wang, and C. C. J. Kuo, “Evaluating word embedding models: Methods and experimental results,†APSIPA Trans. Signal Inf. Process., vol. 8, pp. 1–14, 2019, doi: 10.1017/ATSIP.2019.12.

K. W. Church, “Emerging Trends: Word2Vec,†Nat. Lang. Eng., vol. 23, no. 1, pp. 155–162, 2017, doi: 10.1017/S1351324916000334.

R. D. Deshmukh and A. Kiwelekar, “Deep Learning Techniques for Part of Speech Tagging by Natural Language Processing,†2nd Int. Conf. Innov. Mech. Ind. Appl. ICIMIA 2020 - Conf. Proc., no. Icimia, pp. 76–81, 2020, doi: 10.1109/ICIMIA48430.2020.9074941.

S. Anbukkarasi and S. Varadhaganapathy, “Deep Learning based Tamil Parts of Speech (POS) Tagger,†Bull. Polish Acad. Sci. Tech. Sci., vol. 69, no. 6, pp. 1–6, 2021, doi: 10.24425/bpasts.2021.138820.

M. Hasnain, M. F. Pasha, I. Ghani, M. Imran, M. Y. Alzahrani, and R. Budiarto, “Evaluating Trust Prediction and Confusion Matrix Measures for Web Services Ranking,†IEEE Access, vol. 8, pp. 90847–90861, 2020, doi: 10.1109/ACCESS.2020.2994222.

Downloads

Published

2023-01-28