Implementation of TF-IDF Method and Support Vector Machine Algorithm for Job Applicants Text Classification

Authors

  • Muhammad Faris Luthfi Universitas Telkom, Bandung
  • Kemas Muslim Lhaksamana Universitas Telkom, Bandung

DOI:

https://doi.org/10.30865/mib.v4i4.2276

Keywords:

Recruitment Process, Feature Extraction, Support Vector Machine, Term Frequency-Inverse Document Frequency, Term Frequency-Relevance Frequency

Abstract

Tens of thousands of people are applying for job in PT. Telkom each year. The goal of the recruitment process is to get new employees which can fit PT. Telkom's working culture. Due to the high number of applicants, the recruitment process takes a lot of time and affecting higher cost to spend. We're proposing a popular combination of Term Frequency-Inverse Document Frequency (TF-IDF) as the extraction method and Support Vector Machine (SVM) as the classifier to filter the applicants' interview text. SVM generally produces better accuracy in text classification compared to Random Forest or K-Nearest Neighbors (KNN) algorithm. However, TF-IDF has several developments to improve its flaws, one of them is Term Frequency-Relevance Frequency (TF-RF). As a comparison, in this study we use three extraction methods: TF only (without IDF), TF-IDF, and TF-RF. We use interview texts from PT. Telkom as the data source. The results of combination SVM with TF-IDF can produce 86.31\% of accuracy, with TF only can produce 85.06\%, and with TF-RF can produce 83.61\% of accuracy. The results show extracting method TF-IDF can still outperform TF-RF in term of accuracy.

Author Biographies

Muhammad Faris Luthfi, Universitas Telkom, Bandung

Fakultas Informatika

Kemas Muslim Lhaksamana, Universitas Telkom, Bandung

Fakultas Informatika

References

Badan Pusat Statistik, “Keadaan Ketenagakerjaan Indonesia Agustus 2019,†2019. [Online]. Available: https://www.bps.go.id/pressrelease/2019/11/05/1565/agustus-2019--tingkat-pengangguran-terbuka--tpt--sebesar-5-28-persen.html. [Accessed: 05-Jun-2020].

K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey,†Inf., vol. 10, no. 4, pp. 1–68, 2019.

M. Liu and J. Yang, “An improvement of TFIDF weighting in text categorization,†Int. Conf. Comput. Technol. Sci., vol. 47, no. Iccts, pp. 44–47, 2012.

M. Lan, C. L. Tan, and H. B. Low, “Proposing a new term weighting scheme for text categorization,†Proc. Natl. Conf. Artif. Intell., vol. 1, pp. 763–768, 2006.

M. Lan, C. L. Tan, S. Member, J. Su, and Y. Lu, “Supervised and Traditional Term Weighting Methods for Automatic Text Categorization,†vol. 31, no. 4, pp. 721–735, 2009.

M. Y. Abu Bakar, Adiwijaya, and S. Al Faraby, “Multi-Label Topic Classification of Hadith of Bukhari (Indonesian Language Translation)Using Information Gain and Backpropagation Neural Network,†Proc. 2018 Int. Conf. Asian Lang. Process. IALP 2018, pp. 344–350, 2019.

K. Chen, Z. Zhang, J. Long, and H. Zhang, “Turning from TF-IDF to TF-IGM for term weighting in text classification,†Expert Syst. Appl., vol. 66, pp. 1339–1351, 2016.

Suyanto, Machine Learning Tingkat Dasar dan Lanjut. Bandung: Penerbit Informatika, 2018.

A. K. Uysal and S. Gunal, “The impact of preprocessing on text classification,†Inf. Process. Manag., vol. 50, no. 1, pp. 104–112, 2014.

A. F. Hidayatullah, U. I. Indonesia, C. I. Ratnasari, and U. I. Indonesia, “Analysis of Stemming Influence on Indonesian Tweet Classification,†no. August, 2016.

F. Song, S. Liu, and J. Yang, “A comparative study on text representation schemes in text categorization,†pp. 199–209, 2005.

T. Wong and N. Yang, “Dependency Analysis of Accuracy Estimates in k-fold Cross Validation,†vol. 4347, no. c, pp. 1–12, 2017.

I. Syarif and G. Wills, “SVM Parameter Optimization using Grid Search and Genetic Algorithm to SVM Parameter Optimization Using Grid Search and Genetic Algorithm to Improve Classification Performance,†no. December, 2016.

Downloads

Published

2020-10-20