Multilabel Classification in Indonesian Translation of Religious Text using Word Centrality Term Weighting
DOI:
https://doi.org/10.30865/mib.v8i2.7351Keywords:
Centrality, Classification, Multilabel, Quran, WeightingAbstract
This research focuses on enhancing the understanding of the Quran in the Indonesian translation dataset by employing a word centrality that feeds into a classifier model. The primary goal is to compare the hamming loss score from the TF-IDF and TW-IDF feature extraction methods in the Indonesia translation case study. The TF-IDF is commonly used in prior research. It has a higher hamming loss (which is worse in accuracy) than the TW-IDF incorporating centrality measurement more specifically in degree and closeness centrality. This research adds eigenvector centrality for a new compartment from the other methods. We used SVM, Random Forest (Bagging), and AdaBoost (Boosting) for the classifier model, with Mutual Information as the feature selection method. In evaluating the classifier, Hamming Loss is used given that the method is suitable for multilabel classification. Results indicate that the centrality measurement value for the term weighting method offers a significant improvement over regular TF-IDF. Each centrality method gives the best Hamming Loss score in each classifier model. Degree centrality gets 0.1275 in SVM, closeness centrality gets 0.1367 in AdaBoost, and eigenvector centrality gets 0.1204 in Random Forest. However, eigenvector centrality still can be a strong measurement method to lower the Hamming Loss score. Random Forest and AdaBoost give a significance better over SVM.References
“SEA: Muslim population share by country 2023,” Statista. Accessed: Jan. 03, 2024. [Online]. Available: https://www.statista.com/statistics/1113906/southeast-asia-muslim-population-forecasted-share-by-country/
A. Adiwijaya, S. A. Faraby, and M. S. Mubarok, “Indonesian Translation of the Holy Quran (Multi-label).” Telkom University Dataverse, Jan. 31, 2022. doi: 10.34820/FK2/XQCNPN.
D. Wang, J. Su, and H. Yu, “Feature Extraction and Analysis of Natural Language Processing for Deep Learning English Language,” IEEE Access, vol. PP, pp. 1–1, Feb. 2020, doi: 10.1109/ACCESS.2020.2974101.
G. I. Ulumudin, A. Adiwijaya, and M. S. Mubarok, “A multilabel classification on topics of qur’anic verses in English translation using K-Nearest Neighbor method with Weighted TF-IDF,” in Journal of Physics: Conference Series, Institute of Physics Publishing, May 2019. doi: 10.1088/1742-6596/1192/1/012026.
F. S. Nurfikri and Adiwijaya, “A comparison of Neural Network and SVM on the multi-label classification of Quran verses topic in English translation,” J. Phys. Conf. Ser., vol. 1192, p. 012030, Mar. 2019, doi: 10.1088/1742-6596/1192/1/012030.
E.H. Mohamed and W.H. El-Behaidy, “An Ensemble Multi-label Themes-Based Classification for Holy Qur’an Verses Using Word2Vec Embedding,” Arab. J. Sci. Eng., vol. 46, no. 4, pp. 3519–3529, 2021, doi: 10.1007/s13369-020-05184-0.
Ferdian Yulianto, Kemas Muslim Lhaksmana, and Danang Triantoro Murdiansyah, “Classifying Quranic Verse Topics using Word Centrality Measure,” J. RESTI Rekayasa Sist. Dan Teknol. Inf., vol. 5, no. 3, pp. 594–601, Jun. 2021, doi: 10.29207/resti.v5i3.3171.
Achmad Salim Aiman, Kemas Muslim Lhaksmana, and Jondri, “Topic Classification of Quranic Verses in English Translation Using Word Centrality Measurement,” J. RESTI Rekayasa Sist. Dan Teknol. Inf., vol. 6, no. 5, pp. 803–809, Oct. 2022, doi: 10.29207/resti.v6i5.4358.
B. S. Arkok and A. M. Zeki, “Classification of Quranic topics based on imbalanced classification,” Indones. J. Electr. Eng. Comput. Sci., vol. 22, no. 2, p. 678, May 2021, doi: 10.11591/ijeecs.v22.i2.pp678-687.
Bassam Arkok and Akram M. Zeki, “Classification of Quranic Topics Using Ensemble Learning,” in Proceedings of the 8th International Conference on Computer and Communication Engineering, ICCCE 2021, Institute of Electrical and Electronics Engineers Inc., Jun. 2021, pp. 244–248. doi: 10.1109/ICCCE50029.2021.9467178.
A. Noor and A. Ali, “Multiclass Imbalanced Classification of Quranic Verses Using Deep Learning Approach,” in 2021 4th International Conference on Computing & Information Sciences (ICCIS), Nov. 2021, pp. 1–6. doi: 10.1109/ICCIS54243.2021.9676386.
K. M. Hasib, N. A. Towhid, and M. R. Islam, “HSDLM: A Hybrid Sampling With Deep Learning Method for Imbalanced Data Classification,” Int. J. Cloud Appl. Comput. IJCAC, vol. 11, no. 4, pp. 1–13, Oct. 2021, doi: 10.4018/IJCAC.2021100101.
M. R. Choirulfikri, Adiwijaya, and A. A. Suryani, “Comparison of Bagging and Boosting in Imbalanced Multilabel of Al-Quran Dataset,” presented at the Proceedings - International Conference Advancement in Data Science, E-Learning and Information Systems, ICADEIS 2022, 2022. doi: 10.1109/ICADEIS56544.2022.10037462.
T. M. Khoshgoftaar, J. Van Hulse, and A. Napolitano, “Comparing boosting and bagging techniques with noisy and imbalanced data,” IEEE Trans. Syst. Man Cybern.-Part Syst. Hum., vol. 41, no. 3, pp. 552–568, 2010.
M. Hosni, I. Abnane, A. Idri, J. M. C. de Gea, and J. L. F. Alemán, “Reviewing ensemble classification methods in breast cancer,” Comput. Methods Programs Biomed., vol. 177, pp. 89–112, 2019.
M. Atef Mosa, “Predicting Semantic Categories in Text Based on Knowledge Graph Combined with Machine Learning Techniques,” Appl. Artif. Intell., vol. 35, no. 12, pp. 933–951, 2021, doi: 10.1080/08839514.2021.1966883.
A. Hashemi, M. B. Dowlatshahi, and H. Nezamabadi-pour, “MGFS: A multi-label graph-based feature selection algorithm via PageRank centrality,” Expert Syst. Appl., vol. 142, p. 113024, Mar. 2020, doi: 10.1016/j.eswa.2019.113024.
G. Wu and J. Zhu, “Multi-label classification: do Hamming loss and subset accuracy really conflict with each other?,” in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2020, pp. 3130–3140. Accessed: Jan. 21, 2024. [Online]. Available: https://proceedings.neurips.cc/paper/2020/hash/20479c788fb27378c2c99eadcf207e7f-Abstract.html
A. Qi, “Application of Production-Oriented Approach and Deep Learning Model in English Translation Teaching Model,” Mob. Inf. Syst., vol. 2022, p. e4878684, Oct. 2022, doi: 10.1155/2022/4878684.
M. R. Choirulfikri, K. M. Lhaksamana, and S. A. Faraby, “A Multi-Label Classification of Al-Quran Verses Using Ensemble Method and Naïve Bayes,” Build. Inform. Technol. Sci. BITS, vol. 3, no. 4, pp. 473–479, Mar. 2022, doi: 10.47065/bits.v3i4.1287.
Meisam Ahmadi, Ehsan Khadangi, Seyed Peyman Shariatpanahi, and Mohammad-Hadi Foroughmand-Araabi, “Presenting a Computing Method for Finding the Central Verse of Quranic Surahs,” in 2018 8th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad: IEEE, Oct. 2018, pp. 308–313. doi: 10.1109/ICCKE.2018.8566366.
A. Saxena and S. Iyengar, “Centrality Measures in Complex Networks: A Survey.” arXiv, Nov. 13, 2020. doi: 10.48550/arXiv.2011.07190.
C. Anguzu, C. Engström, J. M. Mango, H. Kasumba, S. Silvestrov, and B. Abola, “Eigenvector Centrality and Uniform Dominant Eigenvalue of Graph Components.” arXiv, Dec. 23, 2021. doi: 10.48550/arXiv.2107.09137.
A. Renjini, M. S. Swapna, V. Raj, and S. Sankararaman, “Graph-based feature extraction and classification of wet and dry cough signals: a machine learning approach,” J. Complex Netw., vol. 9, no. 6, p. cnab039, Dec. 2021, doi: 10.1093/comnet/cnab039.
F. D. Malliaros and K. Skianis, “Graph-Based Term Weighting for Text Categorization,” in Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, Paris France: ACM, Aug. 2015, pp. 1473–1479. doi: 10.1145/2808797.2808872.
Y. Xu, G. J. F. Jones, J. Li, B. Wang, and C. Sun, “A study on mutual information-based feature selectionfor text categorization,” 2007. Accessed: Jan. 05, 2024. [Online]. Available: https://www.semanticscholar.org/paper/A-study-on-mutual-information-based-feature-text-Xu-Jones/6d28489e3de61424b68931889d7561527da4deb7
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).