Part-of-Speech Tagging Implementation on Telkom University News using Bidirectional LSTM Method
DOI:
https://doi.org/10.30865/mib.v7i1.5506Keywords:
Pos Tagging, Bidirectional LSTM, Indonesian, NewsAbstract
News is a tool used to disseminate information through various media, one of which is the internet. Various kinds of news articles have words that are not recognized in the dictionary such as slang words and have foreign words that do not exist in the corpus. How can a POS tagging model built on the corpus be able to handle word class labeling in Indonesian news. The research was conducted to check the results of POS tagging on a collection of news about Telkom University which was selected manually. By using the bidirectional LSTM model, three test scenarios were attempted to improve the performance of the built model, the test scenarios were applying the best padding for the corpus, comparing the performance results of the modified corpus model with the original corpus model, and determining the dimensions of the Word2vec vector. Then the selected model from each corpus is implemented on the news that has been labeled manually. One of the best scenario tests is obtained by modifying the corpus by removing double words in the word class "X" and changing some of the word classes "X" which are more likely to be foreign words so that they are changed to the word class "FW". The best performance results in the implementation of news about Telkom University using the bidirectional LSTM model which was built based on the modified corpus get accuracy values of 92.74%, precision of 92.85%, recall of 92.74%, and F1-score 92.48%.References
D. R. Bulan, “Bahasa Indonesia sebagai Identitas Nasional Bangsa Indonesia,†J. JISIPOL, vol. 3, no. 2, pp. 23–29, 2019, [Online]. Available: http://ejournal.unibba.ac.id/index.php/jisipol/article/view/115
R. S. Yuwana, A. R. Yuliani, and H. F. Pardede, “On part of speech tagger for Indonesian language,†Proc. - 2017 2nd Int. Conf. Inf. Technol. Inf. Syst. Electr. Eng. ICITISEE 2017, vol. 2018-Janua, pp. 369–372, 2018, doi: 10.1109/ICITISEE.2017.8285530.
M. I. Suri and A. S. Puspaningrum, “Sistem Informasi Manajemen Berita Berbasis Web,†J. Teknol. dan Sist. Inf., vol. 1, no. 1, pp. 8–14, 2020, [Online]. Available: http://jim.teknokrat.ac.id/index.php/sisteminformasi
S. Okura, “Embedding-based News Recommendation Topics module First view,†pp. 1933–1942, 2017.
A. Z. Amrullah, R. Hartanto, and I. W. Mustika, “A comparison of different part-of-speech tagging technique for text in Bahasa Indonesia,†Proc. - 2017 7th Int. Annu. Eng. Semin. Ina. 2017, 2017, doi: 10.1109/INAES.2017.8068538.
D. Munandar, E. Suryawati, D. Riswantini, A. F. Abka, R. Wijayanti, and A. Arisal, “POS-tagging for non-english tweets: An automatic approach: (Study in Bahasa Indonesia),†Proc. - 2017 1st Int. Conf. Informatics Comput. Sci. ICICoS 2017, vol. 2018-Janua, pp. 219–224, 2017, doi: 10.1109/ICICOS.2017.8276365.
D. E. Cahyani and M. J. Vindiyanto, “Indonesian part of speech tagging using hidden markov model - Ngram viterbi,†2019 4th Int. Conf. Inf. Technol. Inf. Syst. Electr. Eng. ICITISEE 2019, pp. 353–358, 2019, doi: 10.1109/ICITISEE48480.2019.9003989.
D. Q. Nguyen and K. Verspoor, “An improved neural network model for joint POS tagging and dependency parsing,†CoNLL 2018 - SIGNLL Conf. Comput. Nat. Lang. Learn. Proc. CoNLL 2018 Shar. Task Multiling. Parsing from Raw Text to Univers. Depend., pp. 81–91, 2018, doi: 10.18653/v1/K18-2008.
K. Kurniawan and A. F. Aji, “Toward a Standardized and More Accurate Indonesian Part-of-Speech Tagging,†Proc. 2018 Int. Conf. Asian Lang. Process. IALP 2018, pp. 303–307, 2019, doi: 10.1109/IALP.2018.8629236.
M. Maimaiti, A. Wumaier, K. Abiderexiti, and T. Yibulayin, “Bidirectional long short-term memory network with a conditional random field layer for Uyghur part-of-speech tagging,†Inf., vol. 8, no. 4, 2017, doi: 10.3390/info8040157.
D. Handrata, C. N. Purwanto, F. H. Chandra, J. Santoso, and Gunawan, “Part of Speech Tagging for Indonesian Language using Bidirectional Long Short-Term Memory,†2019 1st Int. Conf. Cybern. Intell. Syst. ICORIS 2019, vol. 1, no. August, pp. 85–88, 2019, doi: 10.1109/ICORIS.2019.8874871.
K. K. Purnamasari and I. S. Suwardi, “Rule-based Part of Speech Tagger for Indonesian Language,†IOP Conf. Ser. Mater. Sci. Eng., vol. 407, no. 1, 2018, doi: 10.1088/1757-899X/407/1/012151.
M. Mursyit, A. P. Wibawa, I. A. E. Zaeni, and H. A. Rosyid, “Pelabelan Kelas Kata Bahasa Jawa Menggunakan Hidden Markov Model,†Mob. Forensics, vol. 2, no. 2, pp. 71–83, 2020, doi: 10.12928/mf.v2i2.2450.
I. G. M. H. Pradiptha and N. A. Sanjaya ER, “Building Balinese Part-of-Speech Tagger Using Hidden Markov Model (HMM),†JELIKU (Jurnal Elektron. Ilmu Komput. Udayana), vol. 9, no. 2, p. 303, 2020, doi: 10.24843/jlk.2020.v09.i02.p18.
N. Schmitt and M. P. H. Rodgers, An Introduction to Applied Linguistics: Third edition. Routledge, 2018. doi: 10.4324/9780429424465.
B. Wang, A. Wang, F. Chen, Y. Wang, and C. C. J. Kuo, “Evaluating word embedding models: Methods and experimental results,†APSIPA Trans. Signal Inf. Process., vol. 8, pp. 1–14, 2019, doi: 10.1017/ATSIP.2019.12.
K. W. Church, “Emerging Trends: Word2Vec,†Nat. Lang. Eng., vol. 23, no. 1, pp. 155–162, 2017, doi: 10.1017/S1351324916000334.
R. D. Deshmukh and A. Kiwelekar, “Deep Learning Techniques for Part of Speech Tagging by Natural Language Processing,†2nd Int. Conf. Innov. Mech. Ind. Appl. ICIMIA 2020 - Conf. Proc., no. Icimia, pp. 76–81, 2020, doi: 10.1109/ICIMIA48430.2020.9074941.
S. Anbukkarasi and S. Varadhaganapathy, “Deep Learning based Tamil Parts of Speech (POS) Tagger,†Bull. Polish Acad. Sci. Tech. Sci., vol. 69, no. 6, pp. 1–6, 2021, doi: 10.24425/bpasts.2021.138820.
M. Hasnain, M. F. Pasha, I. Ghani, M. Imran, M. Y. Alzahrani, and R. Budiarto, “Evaluating Trust Prediction and Confusion Matrix Measures for Web Services Ranking,†IEEE Access, vol. 8, pp. 90847–90861, 2020, doi: 10.1109/ACCESS.2020.2994222.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).