Implementation of TF-IDF Method and Support Vector Machine Algorithm for Job Applicants Text Classification

 (*)Muhammad Faris Luthfi Mail (Universitas Telkom, Bandung, Indonesia)
 Kemas Muslim Lhaksamana (Universitas Telkom, Bandung, Indonesia)

(*) Corresponding Author

Submitted: June 8, 2020; Published: October 20, 2020

DOI: http://dx.doi.org/10.30865/mib.v4i4.2276

Abstract

Tens of thousands of people are applying for job in PT. Telkom each year. The goal of the recruitment process is to get new employees which can fit PT. Telkom's working culture. Due to the high number of applicants, the recruitment process takes a lot of time and affecting higher cost to spend. We're proposing a popular combination of Term Frequency-Inverse Document Frequency (TF-IDF) as the extraction method and Support Vector Machine (SVM) as the classifier to filter the applicants' interview text. SVM generally produces better accuracy in text classification compared to Random Forest or K-Nearest Neighbors (KNN) algorithm. However, TF-IDF has several developments to improve its flaws, one of them is Term Frequency-Relevance Frequency (TF-RF). As a comparison, in this study we use three extraction methods: TF only (without IDF), TF-IDF, and TF-RF. We use interview texts from PT. Telkom as the data source. The results of combination SVM with TF-IDF can produce 86.31\% of accuracy, with TF only can produce 85.06\%, and with TF-RF can produce 83.61\% of accuracy. The results show extracting method TF-IDF can still outperform TF-RF in term of accuracy.

Keywords


Recruitment Process, Feature Extraction, Support Vector Machine, Term Frequency-Inverse Document Frequency, Term Frequency-Relevance Frequency

Full Text:

PDF


Article Metrics

Abstract View: 38 times | PDF View: 7 times

References

Badan Pusat Statistik, “Keadaan Ketenagakerjaan Indonesia Agustus 2019,” 2019. [Online]. Available: https://www.bps.go.id/pressrelease/2019/11/05/1565/agustus-2019--tingkat-pengangguran-terbuka--tpt--sebesar-5-28-persen.html. [Accessed: 05-Jun-2020].

K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey,” Inf., vol. 10, no. 4, pp. 1–68, 2019.

M. Liu and J. Yang, “An improvement of TFIDF weighting in text categorization,” Int. Conf. Comput. Technol. Sci., vol. 47, no. Iccts, pp. 44–47, 2012.

M. Lan, C. L. Tan, and H. B. Low, “Proposing a new term weighting scheme for text categorization,” Proc. Natl. Conf. Artif. Intell., vol. 1, pp. 763–768, 2006.

M. Lan, C. L. Tan, S. Member, J. Su, and Y. Lu, “Supervised and Traditional Term Weighting Methods for Automatic Text Categorization,” vol. 31, no. 4, pp. 721–735, 2009.

M. Y. Abu Bakar, Adiwijaya, and S. Al Faraby, “Multi-Label Topic Classification of Hadith of Bukhari (Indonesian Language Translation)Using Information Gain and Backpropagation Neural Network,” Proc. 2018 Int. Conf. Asian Lang. Process. IALP 2018, pp. 344–350, 2019.

K. Chen, Z. Zhang, J. Long, and H. Zhang, “Turning from TF-IDF to TF-IGM for term weighting in text classification,” Expert Syst. Appl., vol. 66, pp. 1339–1351, 2016.

Suyanto, Machine Learning Tingkat Dasar dan Lanjut. Bandung: Penerbit Informatika, 2018.

A. K. Uysal and S. Gunal, “The impact of preprocessing on text classification,” Inf. Process. Manag., vol. 50, no. 1, pp. 104–112, 2014.

A. F. Hidayatullah, U. I. Indonesia, C. I. Ratnasari, and U. I. Indonesia, “Analysis of Stemming Influence on Indonesian Tweet Classification,” no. August, 2016.

F. Song, S. Liu, and J. Yang, “A comparative study on text representation schemes in text categorization,” pp. 199–209, 2005.

T. Wong and N. Yang, “Dependency Analysis of Accuracy Estimates in k-fold Cross Validation,” vol. 4347, no. c, pp. 1–12, 2017.

I. Syarif and G. Wills, “SVM Parameter Optimization using Grid Search and Genetic Algorithm to SVM Parameter Optimization Using Grid Search and Genetic Algorithm to Improve Classification Performance,” no. December, 2016.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Implementation of TF-IDF Method and Support Vector Machine Algorithm for Job Applicants Text Classification

Refbacks

  • There are currently no refbacks.


Copyright (c) 2020 JURNAL MEDIA INFORMATIKA BUDIDARMA

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.



JURNAL MEDIA INFORMATIKA BUDIDARMA
STMIK Budi Darma
Sekretariat : Jln. Sisingamangaraja No. 338 Telp 061-7875998
email : mib.stmikbd@gmail.com

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.