Implementation of TF-IDF Method and Support Vector Machine Algorithm for Job Applicants Text Classification
DOI:
https://doi.org/10.30865/mib.v4i4.2276Keywords:
Recruitment Process, Feature Extraction, Support Vector Machine, Term Frequency-Inverse Document Frequency, Term Frequency-Relevance FrequencyAbstract
Tens of thousands of people are applying for job in PT. Telkom each year. The goal of the recruitment process is to get new employees which can fit PT. Telkom's working culture. Due to the high number of applicants, the recruitment process takes a lot of time and affecting higher cost to spend. We're proposing a popular combination of Term Frequency-Inverse Document Frequency (TF-IDF) as the extraction method and Support Vector Machine (SVM) as the classifier to filter the applicants' interview text. SVM generally produces better accuracy in text classification compared to Random Forest or K-Nearest Neighbors (KNN) algorithm. However, TF-IDF has several developments to improve its flaws, one of them is Term Frequency-Relevance Frequency (TF-RF). As a comparison, in this study we use three extraction methods: TF only (without IDF), TF-IDF, and TF-RF. We use interview texts from PT. Telkom as the data source. The results of combination SVM with TF-IDF can produce 86.31\% of accuracy, with TF only can produce 85.06\%, and with TF-RF can produce 83.61\% of accuracy. The results show extracting method TF-IDF can still outperform TF-RF in term of accuracy.
References
Badan Pusat Statistik, “Keadaan Ketenagakerjaan Indonesia Agustus 2019,†2019. [Online]. Available: https://www.bps.go.id/pressrelease/2019/11/05/1565/agustus-2019--tingkat-pengangguran-terbuka--tpt--sebesar-5-28-persen.html. [Accessed: 05-Jun-2020].
K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey,†Inf., vol. 10, no. 4, pp. 1–68, 2019.
M. Liu and J. Yang, “An improvement of TFIDF weighting in text categorization,†Int. Conf. Comput. Technol. Sci., vol. 47, no. Iccts, pp. 44–47, 2012.
M. Lan, C. L. Tan, and H. B. Low, “Proposing a new term weighting scheme for text categorization,†Proc. Natl. Conf. Artif. Intell., vol. 1, pp. 763–768, 2006.
M. Lan, C. L. Tan, S. Member, J. Su, and Y. Lu, “Supervised and Traditional Term Weighting Methods for Automatic Text Categorization,†vol. 31, no. 4, pp. 721–735, 2009.
M. Y. Abu Bakar, Adiwijaya, and S. Al Faraby, “Multi-Label Topic Classification of Hadith of Bukhari (Indonesian Language Translation)Using Information Gain and Backpropagation Neural Network,†Proc. 2018 Int. Conf. Asian Lang. Process. IALP 2018, pp. 344–350, 2019.
K. Chen, Z. Zhang, J. Long, and H. Zhang, “Turning from TF-IDF to TF-IGM for term weighting in text classification,†Expert Syst. Appl., vol. 66, pp. 1339–1351, 2016.
Suyanto, Machine Learning Tingkat Dasar dan Lanjut. Bandung: Penerbit Informatika, 2018.
A. K. Uysal and S. Gunal, “The impact of preprocessing on text classification,†Inf. Process. Manag., vol. 50, no. 1, pp. 104–112, 2014.
A. F. Hidayatullah, U. I. Indonesia, C. I. Ratnasari, and U. I. Indonesia, “Analysis of Stemming Influence on Indonesian Tweet Classification,†no. August, 2016.
F. Song, S. Liu, and J. Yang, “A comparative study on text representation schemes in text categorization,†pp. 199–209, 2005.
T. Wong and N. Yang, “Dependency Analysis of Accuracy Estimates in k-fold Cross Validation,†vol. 4347, no. c, pp. 1–12, 2017.
I. Syarif and G. Wills, “SVM Parameter Optimization using Grid Search and Genetic Algorithm to SVM Parameter Optimization Using Grid Search and Genetic Algorithm to Improve Classification Performance,†no. December, 2016.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).