Pengaruh N-Gram terhadap Klasifikasi Buku menggunakan Ekstraksi dan Seleksi Fitur pada Multinomial Naïve Bayes

 (*)Esti Mulyani Mail (Politeknik Negeri Indramayu, Indramayu, Indonesia)
 Fachrul Pralienka Bani Muhamad (Politeknik Negeri Indramayu, Indramayu, Indonesia)
 Kurnia Adi Cahyanto (Politeknik Negeri Indramayu, Indramayu, Indonesia)

(*) Corresponding Author

Submitted: December 14, 2020; Published: January 22, 2021

DOI: http://dx.doi.org/10.30865/mib.v5i1.2672

Abstract

Libraries have the main task in the processing of library materials by classifying books according to certain ways. Dewey Decimal Classification (DDC) is the method most commonly used in the world to determine book classification (labeling) in libraries. The advantages of this DDC method are universal and more systematic. However, this method is less efficient considering the large number of books that must be classified in a library, as well as labeling that must follow label updates on the DDC. An automatic classification system will be the perfect solution to this problem. Automatic classification can be done by applying the text mining method. In this study, searching for words in the book title was carried out with N-Gram (Unigram, Bigram, Trigram) as a feature generation. The features that have been raised are then selected for features. The process of book title classification is carried out using the Naïve Bayes Multinomial algorithm. This study examines the effect of Unigram, Bigram, Trigram on the classification of book titles using the feature extraction and selection feature on Multinomial Naïve Bayes algorithm. The test results show Unigram has the highest accuracy value of 74.4%.

Keywords


Classification; Feature Ekstraction; Feature Selection; Multinomial Naïve Bayes; N-Gram

Full Text:

PDF


Article Metrics

Abstract View: 51 times | PDF View: 9 times

References

A. Ibrahim, “Pengantar Ilmu Perpustakaan dan Arsiparis.” Gunadarma Ilmu, Jakarta Pusat, 2017.

J. Watthananon, “The relationship of text categorization using Dewey Decimal Classification techniques,” Int. Conf. ICT Knowl. Eng., vol. 2015-Janua, no. January, pp. 72–77, 2015.

K. D. K. W. Aisya Frisilya, Wawan Yunanto, “Klasifikasi Kompetensi Tugas Akhir Secara Otomatis Berdasarkan Deskripsi Singkat Menggunakan Per- bandingan Algoritma K-NN dan Naive Bayes.” Jurnal Aksara Komputer Terapan, 2016.

E. Setiani and W. Ce, “Text Classification Services Using Naïve Bayes for Bahasa Indonesia,” Proc. 2018 Int. Conf. Inf. Manag. Technol. ICIMTech 2018, no. September, pp. 361–366, 2018.

G. Singh, B. Kumar, L. Gaur, and A. Tyagi, “Comparison between Multinomial and Bernoulli Naïve Bayes for Text Classification,” 2019 Int. Conf. Autom. Comput. Technol. Manag. ICACTM 2019, pp. 593–596, 2019.

R. S. Ramya, K. R. Venugopal, S. S. Iyengar, and L. M. Patnaik, “Feature extraction and duplicate detection for text mining: A survey,” Glob. J. Comput. Sci. Technol. C Softw. Data Eng., vol. 16, no. 5, pp. 1–20, 2016.

A. I. Kadhim, “Term Weighting for Feature Extraction on Twitter: A Comparison between BM25 and TF-IDF,” 2019 Int. Conf. Adv. Sci. Eng. ICOASE 2019, pp. 124–128, 2019.

L. Zhu, G. Wang, and X. Zou, “Improved information gain feature selection method for Chinese text classification based on word embedding,” ACM Int. Conf. Proceeding Ser., pp. 72–76, 2017.

S. S. Kumar and A. Rajini, “An Efficent Sentimental Analysis for Twitter Using Neural Network based on Rmsprop,” IOSR J. Eng., no. Iccids, pp. 17–25, 2018.

X. Zhang and B. Wu, “Short Text Classification based on feature extension using the N-Gram model,” 2015 12th Int. Conf. Fuzzy Syst. Knowl. Discov. FSKD 2015, pp. 710–716, 2016.

M. Abbas, K. Ali Memon, and A. Aleem Jamali, “Multinomial Naive Bayes Classification Model for Sentiment Analysis,” IJCSNS Int. J. Comput. Sci. Netw. Secur., vol. 19, no. 3, p. 62, 2019.

C. Yin and J. Xi, “Maximum entropy model for mobile text classification in cloud computing using improved information gain algorithm,” Multimed. Tools Appl., vol. 76, no. 16, pp. 16875–16891, 2017.

Munawar, “Sistem Pendeteksi Berita Palsu (Fake News) di Media Sosial dengan Teknik Data Mining Scikit Learn,” Perpust. Univ. Esa Unggul, 2019.

U. Suleymanov, B. K. Kalejahi, E. Amrahov, and R. Badirkhanli, “Text Classification for Azerbaijani Language Using Machine Learning and Embedding,” 2018.

A. K. Pandey and N. K. Goyal, “Fault Prediction Model by Fuzzy Profile Development of Reliability Relevant Software Metrics,” vol. 11, no. 6, pp. 34–41, 2010.

S. Kannan et al., “Preprocessing Techniques for Text Mining,” Int. J. Comput. Sci. Commun. Networks, vol. 5, no. 1, pp. 7–16, 2015.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Pengaruh N-Gram terhadap Klasifikasi Buku menggunakan Ekstraksi dan Seleksi Fitur pada Multinomial Naïve Bayes

Refbacks

  • There are currently no refbacks.


Copyright (c) 2021 JURNAL MEDIA INFORMATIKA BUDIDARMA

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.



JURNAL MEDIA INFORMATIKA BUDIDARMA
STMIK Budi Darma
Sekretariat : Jln. Sisingamangaraja No. 338 Telp 061-7875998
email : mib.stmikbd@gmail.com

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.