Multilabel Classification in Indonesian Translation of Religious Text using Word Centrality Term Weighting

 Muhammad Pascal Dewantara (Telkom University, Bandung, Indonesia)
 (*)Kemas Muslim Lhaksmana Mail (Telkom University, Bandung, Indonesia)

(*) Corresponding Author

Submitted: January 15, 2024; Published: April 30, 2024

Abstract

This research focuses on enhancing the understanding of the Quran in the Indonesian translation dataset by employing a word centrality that feeds into a classifier model. The primary goal is to compare the hamming loss score from the TF-IDF and TW-IDF feature extraction methods in the Indonesia translation case study. The TF-IDF is commonly used in prior research. It has a higher hamming loss (which is worse in accuracy) than the TW-IDF incorporating centrality measurement more specifically in degree and closeness centrality. This research adds eigenvector centrality for a new compartment from the other methods. We used SVM, Random Forest (Bagging), and AdaBoost (Boosting) for the classifier model, with Mutual Information as the feature selection method. In evaluating the classifier, Hamming Loss is used given that the method is suitable for multilabel classification. Results indicate that the centrality measurement value for the term weighting method offers a significant improvement over regular TF-IDF. Each centrality method gives the best Hamming Loss score in each classifier model. Degree centrality gets 0.1275 in SVM, closeness centrality gets 0.1367 in AdaBoost, and eigenvector centrality gets 0.1204 in Random Forest. However, eigenvector centrality still can be a strong measurement method to lower the Hamming Loss score. Random Forest and AdaBoost give a significance better over SVM.

Keywords


Centrality; Classification; Multilabel; Quran; Weighting

Full Text:

PDF


Article Metrics

Abstract view : 63 times
PDF - 21 times

References

SEA: Muslim population share by country 2023, Statista. Accessed: Jan. 03, 2024. [Online]. Available: https://www.statista.com/statistics/1113906/southeast-asia-muslim-population-forecasted-share-by-country/

A. Adiwijaya, S. A. Faraby, and M. S. Mubarok, Indonesian Translation of the Holy Quran (Multi-label). Telkom University Dataverse, Jan. 31, 2022. doi: 10.34820/FK2/XQCNPN.

D. Wang, J. Su, and H. Yu, Feature Extraction and Analysis of Natural Language Processing for Deep Learning English Language, IEEE Access, vol. PP, pp. 11, Feb. 2020, doi: 10.1109/ACCESS.2020.2974101.

G. I. Ulumudin, A. Adiwijaya, and M. S. Mubarok, A multilabel classification on topics of quranic verses in English translation using K-Nearest Neighbor method with Weighted TF-IDF, in Journal of Physics: Conference Series, Institute of Physics Publishing, May 2019. doi: 10.1088/1742-6596/1192/1/012026.

F. S. Nurfikri and Adiwijaya, A comparison of Neural Network and SVM on the multi-label classification of Quran verses topic in English translation, J. Phys. Conf. Ser., vol. 1192, p. 012030, Mar. 2019, doi: 10.1088/1742-6596/1192/1/012030.

E.H. Mohamed and W.H. El-Behaidy, An Ensemble Multi-label Themes-Based Classification for Holy Quran Verses Using Word2Vec Embedding, Arab. J. Sci. Eng., vol. 46, no. 4, pp. 35193529, 2021, doi: 10.1007/s13369-020-05184-0.

Ferdian Yulianto, Kemas Muslim Lhaksmana, and Danang Triantoro Murdiansyah, Classifying Quranic Verse Topics using Word Centrality Measure, J. RESTI Rekayasa Sist. Dan Teknol. Inf., vol. 5, no. 3, pp. 594601, Jun. 2021, doi: 10.29207/resti.v5i3.3171.

Achmad Salim Aiman, Kemas Muslim Lhaksmana, and Jondri, Topic Classification of Quranic Verses in English Translation Using Word Centrality Measurement, J. RESTI Rekayasa Sist. Dan Teknol. Inf., vol. 6, no. 5, pp. 803809, Oct. 2022, doi: 10.29207/resti.v6i5.4358.

B. S. Arkok and A. M. Zeki, Classification of Quranic topics based on imbalanced classification, Indones. J. Electr. Eng. Comput. Sci., vol. 22, no. 2, p. 678, May 2021, doi: 10.11591/ijeecs.v22.i2.pp678-687.

Bassam Arkok and Akram M. Zeki, Classification of Quranic Topics Using Ensemble Learning, in Proceedings of the 8th International Conference on Computer and Communication Engineering, ICCCE 2021, Institute of Electrical and Electronics Engineers Inc., Jun. 2021, pp. 244248. doi: 10.1109/ICCCE50029.2021.9467178.

A. Noor and A. Ali, Multiclass Imbalanced Classification of Quranic Verses Using Deep Learning Approach, in 2021 4th International Conference on Computing & Information Sciences (ICCIS), Nov. 2021, pp. 16. doi: 10.1109/ICCIS54243.2021.9676386.

K. M. Hasib, N. A. Towhid, and M. R. Islam, HSDLM: A Hybrid Sampling With Deep Learning Method for Imbalanced Data Classification, Int. J. Cloud Appl. Comput. IJCAC, vol. 11, no. 4, pp. 113, Oct. 2021, doi: 10.4018/IJCAC.2021100101.

M. R. Choirulfikri, Adiwijaya, and A. A. Suryani, Comparison of Bagging and Boosting in Imbalanced Multilabel of Al-Quran Dataset, presented at the Proceedings - International Conference Advancement in Data Science, E-Learning and Information Systems, ICADEIS 2022, 2022. doi: 10.1109/ICADEIS56544.2022.10037462.

T. M. Khoshgoftaar, J. Van Hulse, and A. Napolitano, Comparing boosting and bagging techniques with noisy and imbalanced data, IEEE Trans. Syst. Man Cybern.-Part Syst. Hum., vol. 41, no. 3, pp. 552568, 2010.

M. Hosni, I. Abnane, A. Idri, J. M. C. de Gea, and J. L. F. Alemn, Reviewing ensemble classification methods in breast cancer, Comput. Methods Programs Biomed., vol. 177, pp. 89112, 2019.

M. Atef Mosa, Predicting Semantic Categories in Text Based on Knowledge Graph Combined with Machine Learning Techniques, Appl. Artif. Intell., vol. 35, no. 12, pp. 933951, 2021, doi: 10.1080/08839514.2021.1966883.

A. Hashemi, M. B. Dowlatshahi, and H. Nezamabadi-pour, MGFS: A multi-label graph-based feature selection algorithm via PageRank centrality, Expert Syst. Appl., vol. 142, p. 113024, Mar. 2020, doi: 10.1016/j.eswa.2019.113024.

G. Wu and J. Zhu, Multi-label classification: do Hamming loss and subset accuracy really conflict with each other?, in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2020, pp. 31303140. Accessed: Jan. 21, 2024. [Online]. Available: https://proceedings.neurips.cc/paper/2020/hash/20479c788fb27378c2c99eadcf207e7f-Abstract.html

A. Qi, Application of Production-Oriented Approach and Deep Learning Model in English Translation Teaching Model, Mob. Inf. Syst., vol. 2022, p. e4878684, Oct. 2022, doi: 10.1155/2022/4878684.

M. R. Choirulfikri, K. M. Lhaksamana, and S. A. Faraby, A Multi-Label Classification of Al-Quran Verses Using Ensemble Method and Nave Bayes, Build. Inform. Technol. Sci. BITS, vol. 3, no. 4, pp. 473479, Mar. 2022, doi: 10.47065/bits.v3i4.1287.

Meisam Ahmadi, Ehsan Khadangi, Seyed Peyman Shariatpanahi, and Mohammad-Hadi Foroughmand-Araabi, Presenting a Computing Method for Finding the Central Verse of Quranic Surahs, in 2018 8th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad: IEEE, Oct. 2018, pp. 308313. doi: 10.1109/ICCKE.2018.8566366.

A. Saxena and S. Iyengar, Centrality Measures in Complex Networks: A Survey. arXiv, Nov. 13, 2020. doi: 10.48550/arXiv.2011.07190.

C. Anguzu, C. Engstrm, J. M. Mango, H. Kasumba, S. Silvestrov, and B. Abola, Eigenvector Centrality and Uniform Dominant Eigenvalue of Graph Components. arXiv, Dec. 23, 2021. doi: 10.48550/arXiv.2107.09137.

A. Renjini, M. S. Swapna, V. Raj, and S. Sankararaman, Graph-based feature extraction and classification of wet and dry cough signals: a machine learning approach, J. Complex Netw., vol. 9, no. 6, p. cnab039, Dec. 2021, doi: 10.1093/comnet/cnab039.

F. D. Malliaros and K. Skianis, Graph-Based Term Weighting for Text Categorization, in Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, Paris France: ACM, Aug. 2015, pp. 14731479. doi: 10.1145/2808797.2808872.

Y. Xu, G. J. F. Jones, J. Li, B. Wang, and C. Sun, A study on mutual information-based feature selectionfor text categorization, 2007. Accessed: Jan. 05, 2024. [Online]. Available: https://www.semanticscholar.org/paper/A-study-on-mutual-information-based-feature-text-Xu-Jones/6d28489e3de61424b68931889d7561527da4deb7

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Multilabel Classification in Indonesian Translation of Religious Text using Word Centrality Term Weighting

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 JURNAL MEDIA INFORMATIKA BUDIDARMA

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.



JURNAL MEDIA INFORMATIKA BUDIDARMA
STMIK Budi Darma
Secretariat: Sisingamangaraja No. 338 Telp 061-7875998
Email: mib.stmikbd@gmail.com

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.