Perbandingan Algoritma Random Forest Classifier, Support Vector Machine dan Logistic Regression Clasifier Pada Masalah High Dimension (Studi Kasus: Klasifikasi Fake News)

Willy Willy; Dian Palupi Rini; Samsuryadi Samsuryadi

doi:10.30865/mib.v5i4.3177

Authors

Willy Willy Universitas Sriwijaya, Palembang
Dian Palupi Rini Universitas Sriwijaya, Palembang
Samsuryadi Samsuryadi Universitas Sriwijaya, Palembang

DOI:

https://doi.org/10.30865/mib.v5i4.3177

Keywords:

Fake News, High Dimension, Dataset, RFC, SVM, LR

Abstract

Fake news is false information that looks like it is true. News can also be said as a political weapon whose truth cannot be accounted for which is spread intentionally to achieve a certain goal. Classification of news texts requires calculating a method for each word in the document. Each word processed per document means that the number of data dimensions is equal to the number of words. The more the number of words in a document, the more the number of dimensions in each data (high dimension). The large number of dimensions (high dimension), causes the model-making process (training) to be long and the shortcomings are also clearly visible in seeing the similarity of documents (document similarity). The dataset used in this study amounted to 20000 and 17 attributes. The method used in this study uses a Random Forest Classifier (RFC), Support Vector Machine (SVM) and Logistic Regression (LR) with high dimensions and the results of this study are to obtain a comparison of the accuracy values for each method used

References

Y. Y. Chen, S.-P. Yong, and A. Ishak, â€œEmail Hoax Detection System Using Levenshtein Distance Method.,â€ JCP, vol. 9, no. 2, pp. 441â€“446, 2014.

C. D. MacDougall, Hoaxes, vol. 465. Dover Publications, 1958.

H. Berghel, â€œAlt-News and Post-Truths in the" Fake News" Era,â€ Computer (Long. Beach. Calif)., vol. 50, no. 4, pp. 110â€“114, 2017.

T. Petkovic, Z. Kostanjcar, and P. Pale, â€œE-mail system for automatic hoax recognition,â€ in 27th MIPRO International Conference, 2005, pp. 117â€“121.

P. Faustini and T. CovÃµes, â€œFake news detection using one-class classification,â€ in 2019 8th Brazilian Conference on Intelligent Systems (BRACIS), 2019, pp. 592â€“597.

S. Zhang, Y. Wang, and C. Tan, â€œResearch on text classification for identifying fake news,â€ in 2018 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), 2018, pp. 178â€“181.

K. Shah, H. Patel, D. Sanghvi, and M. Shah, â€œA comparative analysis of logistic regression, random forest and KNN models for the text classification,â€ Augment. Hum. Res., vol. 5, no. 1, pp. 1â€“16, 2020.

H. T. Sueno, B. D. Gerardo, and R. P. Medina, â€œMulti-class document classification using support vector machine (SVM) based on improved Na{"i}ve bayes vectorization technique,â€ Int. J. Adv. Trends Comput. Sci. Eng., vol. 9, no. 3, 2020.

D. M. J. Lazer et al., â€œThe science of fake news,â€ Science (80-. )., vol. 359, no. 6380, pp. 1094â€“1096, 2018, doi: 10.1126/science.aao2998.

A. Choudhary and A. Arora, â€œLinguistic feature based learning model for fake news detection and classification,â€ Expert Syst. Appl., vol. 169, p. 114171, 2021, doi: 10.1016/j.eswa.2020.114171.

T. G. Dietterich and Oregon, â€œEnsemble methods in machine learning. In: International Workshop on Multiple Classifier Models,â€ Oncogene, vol. 12, no. 2, p. pp 1-15(265-275), 1996.

Tin Kam Ho, â€œRandom Decision Forests Tin Kam Ho Perceptron training,â€ Proc. 3rd Int. Conf. Doc. Anal. Recognit., pp. 278â€“282, 1995.

R. Richman and M. V. WÃ¼thrich, â€œNagging predictors,â€ Risks, vol. 8, no. 3, pp. 1â€“26, 2020, doi: 10.3390/risks8030083.

Z. Jin, J. Shang, Q. Zhu, C. Ling, W. Xie, and B. Qiang, â€œRFRSF: Employee Turnover Prediction Based on Random Forests and Survival Analysis,â€ Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 12343 LNCS, pp. 503â€“515, 2020, doi: 10.1007/978-3-030-62008-0_35.

V. F. Rodriguez-Galiano, B. Ghimire, J. Rogan, M. Chica-Olmo, and J. P. Rigol-Sanchez, â€œAn assessment of the effectiveness of a random forest classifier for land-cover classification,â€ ISPRS J. Photogramm. Remote Sens., vol. 67, no. 1, pp. 93â€“104, 2012, doi: 10.1016/j.isprsjprs.2011.11.002.

V. N. Vapnik, Statistics for Engineering and Information Science Springer Science+Business Media, LLC. 2000.

V. N. Vapnik, â€œAn overview of statistical learning theory,â€ IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 988â€“999, 1999, doi: 10.1109/72.788640.

H. L. Chen, B. Yang, J. Liu, and D. Y. Liu, â€œA support vector machine classifier with rough set-based feature selection for breast cancer diagnosis,â€ Expert Syst. Appl., vol. 38, no. 7, pp. 9014â€“9022, 2011, doi: 10.1016/j.eswa.2011.01.120.

N. H. Farhat, â€œPhotonit neural networks and learning mathines the role of electron-trapping materials,â€ IEEE Expert. Syst. their Appl., vol. 7, no. 5, pp. 63â€“72, 1992, doi: 10.1109/64.163674.

M. Seeger, â€œGaussian processes for machine learning.,â€ Int. J. Neural Syst., vol. 14, no. 2, pp. 69â€“106, 2004, doi: 10.1142/S0129065704001899.

D. MatiÄ‡, F. KuliÄ‡, M. Pineda-SÃ¡nchez, and I. Kamenko, â€œSupport vector machine classifier for diagnosis in electrical machines: Application to broken bar,â€ Expert Syst. Appl., vol. 39, no. 10, pp. 8681â€“8689, 2012, doi: 10.1016/j.eswa.2012.01.214.

U. Ravale, N. Marathe, and P. Padiya, â€œFeature selection based hybrid anomaly intrusion detection system using K Means and RBF kernel function,â€ Procedia Comput. Sci., vol. 45, no. C, pp. 428â€“435, 2015, doi: 10.1016/j.procs.2015.03.174.

J. S. Cramer, â€œThe Origins of Logistic Regression,â€ SSRN Electron. J., 2005, doi: 10.2139/ssrn.360300.

P. Tsangaratos and I. Ilia, â€œComparison of a logistic regression and NaÃ¯ve Bayes classifier in landslide susceptibility assessments: The influence of models complexity and training dataset size,â€ Catena, vol. 145, pp. 164â€“179, 2016, doi: 10.1016/j.catena.2016.06.004.

R. Zakharov and P. Dupont, â€œfor Feature Selection,â€ no. May, 2014, doi: 10.1007/978-3-642-24855-9.