Pengaruh N-Gram terhadap Klasifikasi Buku menggunakan Ekstraksi dan Seleksi Fitur pada Multinomial Naïve Bayes
DOI:
https://doi.org/10.30865/mib.v5i1.2672Keywords:
Classification, Feature Ekstraction, Feature Selection, Multinomial Naïve Bayes, N-GramAbstract
Libraries have the main task in the processing of library materials by classifying books according to certain ways. Dewey Decimal Classification (DDC) is the method most commonly used in the world to determine book classification (labeling) in libraries. The advantages of this DDC method are universal and more systematic. However, this method is less efficient considering the large number of books that must be classified in a library, as well as labeling that must follow label updates on the DDC. An automatic classification system will be the perfect solution to this problem. Automatic classification can be done by applying the text mining method. In this study, searching for words in the book title was carried out with N-Gram (Unigram, Bigram, Trigram) as a feature generation. The features that have been raised are then selected for features. The process of book title classification is carried out using the Naïve Bayes Multinomial algorithm. This study examines the effect of Unigram, Bigram, Trigram on the classification of book titles using the feature extraction and selection feature on Multinomial Naïve Bayes algorithm. The test results show Unigram has the highest accuracy value of 74.4%.
References
A. Ibrahim, “Pengantar Ilmu Perpustakaan dan Arsiparis.†Gunadarma Ilmu, Jakarta Pusat, 2017.
J. Watthananon, “The relationship of text categorization using Dewey Decimal Classification techniques,†Int. Conf. ICT Knowl. Eng., vol. 2015-Janua, no. January, pp. 72–77, 2015.
K. D. K. W. Aisya Frisilya, Wawan Yunanto, “Klasifikasi Kompetensi Tugas Akhir Secara Otomatis Berdasarkan Deskripsi Singkat Menggunakan Per- bandingan Algoritma K-NN dan Naive Bayes.†Jurnal Aksara Komputer Terapan, 2016.
E. Setiani and W. Ce, “Text Classification Services Using Naïve Bayes for Bahasa Indonesia,†Proc. 2018 Int. Conf. Inf. Manag. Technol. ICIMTech 2018, no. September, pp. 361–366, 2018.
G. Singh, B. Kumar, L. Gaur, and A. Tyagi, “Comparison between Multinomial and Bernoulli Naïve Bayes for Text Classification,†2019 Int. Conf. Autom. Comput. Technol. Manag. ICACTM 2019, pp. 593–596, 2019.
R. S. Ramya, K. R. Venugopal, S. S. Iyengar, and L. M. Patnaik, “Feature extraction and duplicate detection for text mining: A survey,†Glob. J. Comput. Sci. Technol. C Softw. Data Eng., vol. 16, no. 5, pp. 1–20, 2016.
A. I. Kadhim, “Term Weighting for Feature Extraction on Twitter: A Comparison between BM25 and TF-IDF,†2019 Int. Conf. Adv. Sci. Eng. ICOASE 2019, pp. 124–128, 2019.
L. Zhu, G. Wang, and X. Zou, “Improved information gain feature selection method for Chinese text classification based on word embedding,†ACM Int. Conf. Proceeding Ser., pp. 72–76, 2017.
S. S. Kumar and A. Rajini, “An Efficent Sentimental Analysis for Twitter Using Neural Network based on Rmsprop,†IOSR J. Eng., no. Iccids, pp. 17–25, 2018.
X. Zhang and B. Wu, “Short Text Classification based on feature extension using the N-Gram model,†2015 12th Int. Conf. Fuzzy Syst. Knowl. Discov. FSKD 2015, pp. 710–716, 2016.
M. Abbas, K. Ali Memon, and A. Aleem Jamali, “Multinomial Naive Bayes Classification Model for Sentiment Analysis,†IJCSNS Int. J. Comput. Sci. Netw. Secur., vol. 19, no. 3, p. 62, 2019.
C. Yin and J. Xi, “Maximum entropy model for mobile text classification in cloud computing using improved information gain algorithm,†Multimed. Tools Appl., vol. 76, no. 16, pp. 16875–16891, 2017.
Munawar, “Sistem Pendeteksi Berita Palsu (Fake News) di Media Sosial dengan Teknik Data Mining Scikit Learn,†Perpust. Univ. Esa Unggul, 2019.
U. Suleymanov, B. K. Kalejahi, E. Amrahov, and R. Badirkhanli, “Text Classification for Azerbaijani Language Using Machine Learning and Embedding,†2018.
A. K. Pandey and N. K. Goyal, “Fault Prediction Model by Fuzzy Profile Development of Reliability Relevant Software Metrics,†vol. 11, no. 6, pp. 34–41, 2010.
S. Kannan et al., “Preprocessing Techniques for Text Mining,†Int. J. Comput. Sci. Commun. Networks, vol. 5, no. 1, pp. 7–16, 2015.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).