Implementation of TF-IDF Method and Support Vector Machine Algorithm for Job Applicants Text Classification
Tens of thousands of people are applying for job in PT. Telkom each year. The goal of the recruitment process is to get new employees which can fit PT. Telkom's working culture. Due to the high number of applicants, the recruitment process takes a lot of time and affecting higher cost to spend. We're proposing a popular combination of Term Frequency-Inverse Document Frequency (TF-IDF) as the extraction method and Support Vector Machine (SVM) as the classifier to filter the applicants' interview text. SVM generally produces better accuracy in text classification compared to Random Forest or K-Nearest Neighbors (KNN) algorithm. However, TF-IDF has several developments to improve its flaws, one of them is Term Frequency-Relevance Frequency (TF-RF). As a comparison, in this study we use three extraction methods: TF only (without IDF), TF-IDF, and TF-RF. We use interview texts from PT. Telkom as the data source. The results of combination SVM with TF-IDF can produce 86.31\% of accuracy, with TF only can produce 85.06\%, and with TF-RF can produce 83.61\% of accuracy. The results show extracting method TF-IDF can still outperform TF-RF in term of accuracy.
Article MetricsAbstract view : 282 times
PDF - 114 times
Badan Pusat Statistik, “Keadaan Ketenagakerjaan Indonesia Agustus 2019,” 2019. [Online]. Available: https://www.bps.go.id/pressrelease/2019/11/05/1565/agustus-2019--tingkat-pengangguran-terbuka--tpt--sebesar-5-28-persen.html. [Accessed: 05-Jun-2020].
K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey,” Inf., vol. 10, no. 4, pp. 1–68, 2019.
M. Liu and J. Yang, “An improvement of TFIDF weighting in text categorization,” Int. Conf. Comput. Technol. Sci., vol. 47, no. Iccts, pp. 44–47, 2012.
M. Lan, C. L. Tan, and H. B. Low, “Proposing a new term weighting scheme for text categorization,” Proc. Natl. Conf. Artif. Intell., vol. 1, pp. 763–768, 2006.
M. Lan, C. L. Tan, S. Member, J. Su, and Y. Lu, “Supervised and Traditional Term Weighting Methods for Automatic Text Categorization,” vol. 31, no. 4, pp. 721–735, 2009.
M. Y. Abu Bakar, Adiwijaya, and S. Al Faraby, “Multi-Label Topic Classification of Hadith of Bukhari (Indonesian Language Translation)Using Information Gain and Backpropagation Neural Network,” Proc. 2018 Int. Conf. Asian Lang. Process. IALP 2018, pp. 344–350, 2019.
K. Chen, Z. Zhang, J. Long, and H. Zhang, “Turning from TF-IDF to TF-IGM for term weighting in text classification,” Expert Syst. Appl., vol. 66, pp. 1339–1351, 2016.
Suyanto, Machine Learning Tingkat Dasar dan Lanjut. Bandung: Penerbit Informatika, 2018.
A. K. Uysal and S. Gunal, “The impact of preprocessing on text classification,” Inf. Process. Manag., vol. 50, no. 1, pp. 104–112, 2014.
A. F. Hidayatullah, U. I. Indonesia, C. I. Ratnasari, and U. I. Indonesia, “Analysis of Stemming Influence on Indonesian Tweet Classification,” no. August, 2016.
F. Song, S. Liu, and J. Yang, “A comparative study on text representation schemes in text categorization,” pp. 199–209, 2005.
T. Wong and N. Yang, “Dependency Analysis of Accuracy Estimates in k-fold Cross Validation,” vol. 4347, no. c, pp. 1–12, 2017.
I. Syarif and G. Wills, “SVM Parameter Optimization using Grid Search and Genetic Algorithm to SVM Parameter Optimization Using Grid Search and Genetic Algorithm to Improve Classification Performance,” no. December, 2016.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Implementation of TF-IDF Method and Support Vector Machine Algorithm for Job Applicants Text Classification
- There are currently no refbacks.
Copyright (c) 2020 JURNAL MEDIA INFORMATIKA BUDIDARMA
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
JURNAL MEDIA INFORMATIKA BUDIDARMA
STMIK Budi Darma
Sekretariat : Jln. Sisingamangaraja No. 338 Telp 061-7875998
email : email@example.com
This work is licensed under a Creative Commons Attribution 4.0 International License.