Single-Label and Multi-Label Text Classification using ANN and Comparison with Naïve Bayes and SVM

Authors

  • M. Mahfi Nurandi Karsana Telkom University, Bandung
  • Kemas Muslim L. Telkom University, Bandung
  • Widi Astuti Telkom University, Bandung

DOI:

https://doi.org/10.30865/mib.v7i2.6024

Keywords:

ANN, F1-Macro, Naive Bayes, Text Classification, SVM

Abstract

Machine learning has become useful in daily life thanks to improvements in machine learning techniques. Text classification as an important part in machine learning. There are already many methods used for text classification such as Artificial Neural Network (ANN), Naïve Bayes, SVM, Decision Tree etc.  ANN is a branch in machine learning which approximate the function of natural neural network. ANN have been used extensively for classification. In this research a simple architecture of ANN is used. But it needs to be pointed out that the architecture used in this research is relatively simple compared to the cutting edge in ANN development and research to show the potential that ANN have compared to other classification method. ANN, Naïve Bayes and SVM performance are measured using f1-macro. Performance of classification model is measured of multiple single-label and multi-label dataset. This research found that in single-label classification ANN have a comparable f1-macro with 0.79 compared to 0.82 for SVM. In multi-label classification ANN have the best f1-macro with 0.48 compared to 0.44 in SVM.

References

M. M. Mirończuk and J. Protasiewicz, “A recent overview of the state-of-the-art elements of text classification,†Expert Syst Appl, vol. 106, pp. 36–54, 2018.

J. Zheng and L. Zheng, “A Hybrid Bidirectional Recurrent Convolutional Neural Network Attention-Based Model for Text Classification,†IEEE Access, vol. 7, pp. 106673–106685, 2019, doi: 10.1109/ACCESS.2019.2932619.

R. DziseviÄ and D. Å eÅ¡ok, “Text classification using different feature extraction approaches,†in 2019 Open Conference of Electrical, Electronic and Information Sciences (eStream), 2019, pp. 1–4.

K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey,†Information, vol. 10, no. 4, p. 150, 2019.

A. Elnagar, R. Al-Debsi, and O. Einea, “Arabic text classification using deep learning models,†Inf Process Manag, vol. 57, no. 1, p. 102121, 2020.

O. I. Abiodun, A. Jantan, A. E. Omolara, K. V. Dada, N. A. Mohamed, and H. Arshad, “State-of-the-art in artificial neural network applications: A survey,†Heliyon, vol. 4, no. 11, p. e00938, 2018.

Q. Li et al., “A survey on text classification: From shallow to deep learning,†arXiv preprint arXiv:2008.00364, 2020.

M. A. Ahmed, R. A. Hasan, A. H. Ali, and M. A. Mohammed, “The classification of the modern arabic poetry using machine learning,†TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 17, no. 5, pp. 2667–2674, 2019.

J. Kolluri and S. Razia, “Text classification using Na"ive Bayes classifier,†Mater Today Proc, 2020.

A. I. Kadhim, “Survey on supervised machine learning techniques for automatic text classification,†Artif Intell Rev, vol. 52, no. 1, pp. 273–292, 2019.

H. Kim, J. Kim, J. Kim, and P. Lim, “Towards perfect text classification with Wikipedia-based semantic Naive Bayes learning,†Neurocomputing, vol. 315, pp. 128–134, 2018.

K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey,†Information, vol. 10, no. 4, p. 150, 2019.

X. Luo, “Efficient english text classification using selected machine learning techniques,†Alexandria Engineering Journal, vol. 60, no. 3, pp. 3401–3409, 2021.

A. Géron, Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. “ O’Reilly Media, Inc.,†2017.

A. W. Haryanto, E. K. Mawardi, and others, “Influence of word normalization and chi-squared feature selection on support vector machine (svm) text classification,†in 2018 International Seminar on Application for Technology of Information and Communication, 2018, pp. 229–233.

T. B. Shahi and A. K. Pant, “Nepali news classification using Na"ive Bayes, support vector machines and neural networks,†in 2018 International Conference on Communication Information and Computing Technology (ICCICT), 2018, pp. 1–5.

D. Yuliana and C. Supriyanto, “Klasifikasi Teks Pengaduan Masyarakat Dengan Menggunakan Algoritma Neural Network,†vol. 5, no. 3, pp. 92–116, 2019, doi: 10.29165/komtekinfo.v5i2.

W. Chen, B. Zhang, and M. Lu, “Uncertainty quantification for multilabel text classification,†Wiley Interdiscip Rev Data Min Knowl Discov, vol. 10, no. 6, p. e1384, 2020.

A. M. de J. C. Cachopo and others, “Improving methods for single-label text categorization,†Instituto Superior Técnico, Portugal, 2007.

D. Greene and P. Cunningham, “Practical solutions to the problem of diagonal dominance in kernel document clustering,†in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 377–384.

D. Kershaw and R. Koeling, “Elsevier OA CC-By Corpus,†CoRR, vol. abs/2008.00774, 2020, [Online]. Available: https://arxiv.org/abs/2008.00774

G. Singh, B. Kumar, L. Gaur, and A. Tyagi, “Comparison between multinomial and Bernoulli na"ive Bayes for text classification,†in 2019 International Conference on Automation, Computational and Technology Management (ICACTM), 2019, pp. 593–596.

D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,†BMC Genomics, vol. 21, pp. 1–13, 2020.

M. E. Polus and T. Abbas, “Development for performance of Porter Stemmer algorithm,†Eastern-European Journal of Enterprise Technologies, vol. 1, no. 2, p. 109, 2021.

B. G. Marcot and A. M. Hanea, “What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis?,†Comput Stat, vol. 36, no. 3, pp. 2009–2031, 2021.

Downloads

Published

2023-04-27

Issue

Section

Articles