Implementation of TF-IDF Method and Support Vector Machine Algorithm for Job Applicants Text Classification

Muhammad Faris Luthfi; Kemas Muslim Lhaksamana

doi:10.30865/mib.v4i4.2276

Authors

Muhammad Faris Luthfi Universitas Telkom, Bandung
Kemas Muslim Lhaksamana Universitas Telkom, Bandung

DOI:

https://doi.org/10.30865/mib.v4i4.2276

Keywords:

Recruitment Process, Feature Extraction, Support Vector Machine, Term Frequency-Inverse Document Frequency, Term Frequency-Relevance Frequency

Abstract

Tens of thousands of people are applying for job in PT. Telkom each year. The goal of the recruitment process is to get new employees which can fit PT. Telkom's working culture. Due to the high number of applicants, the recruitment process takes a lot of time and affecting higher cost to spend. We're proposing a popular combination of Term Frequency-Inverse Document Frequency (TF-IDF) as the extraction method and Support Vector Machine (SVM) as the classifier to filter the applicants' interview text. SVM generally produces better accuracy in text classification compared to Random Forest or K-Nearest Neighbors (KNN) algorithm. However, TF-IDF has several developments to improve its flaws, one of them is Term Frequency-Relevance Frequency (TF-RF). As a comparison, in this study we use three extraction methods: TF only (without IDF), TF-IDF, and TF-RF. We use interview texts from PT. Telkom as the data source. The results of combination SVM with TF-IDF can produce 86.31\% of accuracy, with TF only can produce 85.06\%, and with TF-RF can produce 83.61\% of accuracy. The results show extracting method TF-IDF can still outperform TF-RF in term of accuracy.

Author Biographies

Muhammad Faris Luthfi, Universitas Telkom, Bandung

Fakultas Informatika

Kemas Muslim Lhaksamana, Universitas Telkom, Bandung

Fakultas Informatika

References

Badan Pusat Statistik, â€œKeadaan Ketenagakerjaan Indonesia Agustus 2019,â€ 2019. [Online]. Available: https://www.bps.go.id/pressrelease/2019/11/05/1565/agustus-2019--tingkat-pengangguran-terbuka--tpt--sebesar-5-28-persen.html. [Accessed: 05-Jun-2020].

K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, â€œText classification algorithms: A survey,â€ Inf., vol. 10, no. 4, pp. 1â€“68, 2019.

M. Liu and J. Yang, â€œAn improvement of TFIDF weighting in text categorization,â€ Int. Conf. Comput. Technol. Sci., vol. 47, no. Iccts, pp. 44â€“47, 2012.

M. Lan, C. L. Tan, and H. B. Low, â€œProposing a new term weighting scheme for text categorization,â€ Proc. Natl. Conf. Artif. Intell., vol. 1, pp. 763â€“768, 2006.

M. Lan, C. L. Tan, S. Member, J. Su, and Y. Lu, â€œSupervised and Traditional Term Weighting Methods for Automatic Text Categorization,â€ vol. 31, no. 4, pp. 721â€“735, 2009.

M. Y. Abu Bakar, Adiwijaya, and S. Al Faraby, â€œMulti-Label Topic Classification of Hadith of Bukhari (Indonesian Language Translation)Using Information Gain and Backpropagation Neural Network,â€ Proc. 2018 Int. Conf. Asian Lang. Process. IALP 2018, pp. 344â€“350, 2019.

K. Chen, Z. Zhang, J. Long, and H. Zhang, â€œTurning from TF-IDF to TF-IGM for term weighting in text classification,â€ Expert Syst. Appl., vol. 66, pp. 1339â€“1351, 2016.

Suyanto, Machine Learning Tingkat Dasar dan Lanjut. Bandung: Penerbit Informatika, 2018.

A. K. Uysal and S. Gunal, â€œThe impact of preprocessing on text classification,â€ Inf. Process. Manag., vol. 50, no. 1, pp. 104â€“112, 2014.

A. F. Hidayatullah, U. I. Indonesia, C. I. Ratnasari, and U. I. Indonesia, â€œAnalysis of Stemming Influence on Indonesian Tweet Classification,â€ no. August, 2016.

F. Song, S. Liu, and J. Yang, â€œA comparative study on text representation schemes in text categorization,â€ pp. 199â€“209, 2005.

T. Wong and N. Yang, â€œDependency Analysis of Accuracy Estimates in k-fold Cross Validation,â€ vol. 4347, no. c, pp. 1â€“12, 2017.

I. Syarif and G. Wills, â€œSVM Parameter Optimization using Grid Search and Genetic Algorithm to SVM Parameter Optimization Using Grid Search and Genetic Algorithm to Improve Classification Performance,â€ no. December, 2016.

Implementation of TF-IDF Method and Support Vector Machine Algorithm for Job Applicants Text Classification

Authors

DOI:

Keywords:

Abstract

Author Biographies

Muhammad Faris Luthfi, Universitas Telkom, Bandung

Kemas Muslim Lhaksamana, Universitas Telkom, Bandung

References

Downloads

Published

How to Cite

Issue

Section

License

Menu Utama

flagcounter

template

statcounter

rji

terindex

Keywords