Pengaruh N-Gram terhadap Klasifikasi Buku menggunakan Ekstraksi dan Seleksi Fitur pada Multinomial Naïve Bayes

Authors

  • Esti Mulyani Politeknik Negeri Indramayu, Indramayu
  • Fachrul Pralienka Bani Muhamad Politeknik Negeri Indramayu, Indramayu
  • Kurnia Adi Cahyanto Politeknik Negeri Indramayu, Indramayu

DOI:

https://doi.org/10.30865/mib.v5i1.2672

Keywords:

Classification, Feature Ekstraction, Feature Selection, Multinomial Naïve Bayes, N-Gram

Abstract

Libraries have the main task in the processing of library materials by classifying books according to certain ways. Dewey Decimal Classification (DDC) is the method most commonly used in the world to determine book classification (labeling) in libraries. The advantages of this DDC method are universal and more systematic. However, this method is less efficient considering the large number of books that must be classified in a library, as well as labeling that must follow label updates on the DDC. An automatic classification system will be the perfect solution to this problem. Automatic classification can be done by applying the text mining method. In this study, searching for words in the book title was carried out with N-Gram (Unigram, Bigram, Trigram) as a feature generation. The features that have been raised are then selected for features. The process of book title classification is carried out using the Naïve Bayes Multinomial algorithm. This study examines the effect of Unigram, Bigram, Trigram on the classification of book titles using the feature extraction and selection feature on Multinomial Naïve Bayes algorithm. The test results show Unigram has the highest accuracy value of 74.4%.

Author Biographies

Esti Mulyani, Politeknik Negeri Indramayu, Indramayu

Program Studi Teknik Informatika

Fachrul Pralienka Bani Muhamad, Politeknik Negeri Indramayu, Indramayu

Program Studi Teknik Informatika

Kurnia Adi Cahyanto, Politeknik Negeri Indramayu, Indramayu

Program Studi Teknik Informatika

References

A. Ibrahim, “Pengantar Ilmu Perpustakaan dan Arsiparis.†Gunadarma Ilmu, Jakarta Pusat, 2017.

J. Watthananon, “The relationship of text categorization using Dewey Decimal Classification techniques,†Int. Conf. ICT Knowl. Eng., vol. 2015-Janua, no. January, pp. 72–77, 2015.

K. D. K. W. Aisya Frisilya, Wawan Yunanto, “Klasifikasi Kompetensi Tugas Akhir Secara Otomatis Berdasarkan Deskripsi Singkat Menggunakan Per- bandingan Algoritma K-NN dan Naive Bayes.†Jurnal Aksara Komputer Terapan, 2016.

E. Setiani and W. Ce, “Text Classification Services Using Naïve Bayes for Bahasa Indonesia,†Proc. 2018 Int. Conf. Inf. Manag. Technol. ICIMTech 2018, no. September, pp. 361–366, 2018.

G. Singh, B. Kumar, L. Gaur, and A. Tyagi, “Comparison between Multinomial and Bernoulli Naïve Bayes for Text Classification,†2019 Int. Conf. Autom. Comput. Technol. Manag. ICACTM 2019, pp. 593–596, 2019.

R. S. Ramya, K. R. Venugopal, S. S. Iyengar, and L. M. Patnaik, “Feature extraction and duplicate detection for text mining: A survey,†Glob. J. Comput. Sci. Technol. C Softw. Data Eng., vol. 16, no. 5, pp. 1–20, 2016.

A. I. Kadhim, “Term Weighting for Feature Extraction on Twitter: A Comparison between BM25 and TF-IDF,†2019 Int. Conf. Adv. Sci. Eng. ICOASE 2019, pp. 124–128, 2019.

L. Zhu, G. Wang, and X. Zou, “Improved information gain feature selection method for Chinese text classification based on word embedding,†ACM Int. Conf. Proceeding Ser., pp. 72–76, 2017.

S. S. Kumar and A. Rajini, “An Efficent Sentimental Analysis for Twitter Using Neural Network based on Rmsprop,†IOSR J. Eng., no. Iccids, pp. 17–25, 2018.

X. Zhang and B. Wu, “Short Text Classification based on feature extension using the N-Gram model,†2015 12th Int. Conf. Fuzzy Syst. Knowl. Discov. FSKD 2015, pp. 710–716, 2016.

M. Abbas, K. Ali Memon, and A. Aleem Jamali, “Multinomial Naive Bayes Classification Model for Sentiment Analysis,†IJCSNS Int. J. Comput. Sci. Netw. Secur., vol. 19, no. 3, p. 62, 2019.

C. Yin and J. Xi, “Maximum entropy model for mobile text classification in cloud computing using improved information gain algorithm,†Multimed. Tools Appl., vol. 76, no. 16, pp. 16875–16891, 2017.

Munawar, “Sistem Pendeteksi Berita Palsu (Fake News) di Media Sosial dengan Teknik Data Mining Scikit Learn,†Perpust. Univ. Esa Unggul, 2019.

U. Suleymanov, B. K. Kalejahi, E. Amrahov, and R. Badirkhanli, “Text Classification for Azerbaijani Language Using Machine Learning and Embedding,†2018.

A. K. Pandey and N. K. Goyal, “Fault Prediction Model by Fuzzy Profile Development of Reliability Relevant Software Metrics,†vol. 11, no. 6, pp. 34–41, 2010.

S. Kannan et al., “Preprocessing Techniques for Text Mining,†Int. J. Comput. Sci. Commun. Networks, vol. 5, no. 1, pp. 7–16, 2015.

Downloads

Published

2021-01-22