Perbandingan Algoritma Random Forest Classifier, Support Vector Machine dan Logistic Regression Clasifier Pada Masalah High Dimension (Studi Kasus: Klasifikasi Fake News)

Authors

  • Willy Willy Universitas Sriwijaya, Palembang
  • Dian Palupi Rini Universitas Sriwijaya, Palembang
  • Samsuryadi Samsuryadi Universitas Sriwijaya, Palembang

DOI:

https://doi.org/10.30865/mib.v5i4.3177

Keywords:

Fake News, High Dimension, Dataset, RFC, SVM, LR

Abstract

Fake news is false information that looks like it is true. News can also be said as a political weapon whose truth cannot be accounted for which is spread intentionally to achieve a certain goal. Classification of news texts requires calculating a method for each word in the document. Each word processed per document means that the number of data dimensions is equal to the number of words. The more the number of words in a document, the more the number of dimensions in each data (high dimension). The large number of dimensions (high dimension), causes the model-making process (training) to be long and the shortcomings are also clearly visible in seeing the similarity of documents (document similarity). The dataset used in this study amounted to 20000 and 17 attributes. The method used in this study uses a Random Forest Classifier (RFC), Support Vector Machine (SVM) and Logistic Regression (LR) with high dimensions and the results of this study are to obtain a comparison of the accuracy values for each method used

References

Y. Y. Chen, S.-P. Yong, and A. Ishak, “Email Hoax Detection System Using Levenshtein Distance Method.,†JCP, vol. 9, no. 2, pp. 441–446, 2014.

C. D. MacDougall, Hoaxes, vol. 465. Dover Publications, 1958.

H. Berghel, “Alt-News and Post-Truths in the" Fake News" Era,†Computer (Long. Beach. Calif)., vol. 50, no. 4, pp. 110–114, 2017.

T. Petkovic, Z. Kostanjcar, and P. Pale, “E-mail system for automatic hoax recognition,†in 27th MIPRO International Conference, 2005, pp. 117–121.

P. Faustini and T. Covões, “Fake news detection using one-class classification,†in 2019 8th Brazilian Conference on Intelligent Systems (BRACIS), 2019, pp. 592–597.

S. Zhang, Y. Wang, and C. Tan, “Research on text classification for identifying fake news,†in 2018 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), 2018, pp. 178–181.

K. Shah, H. Patel, D. Sanghvi, and M. Shah, “A comparative analysis of logistic regression, random forest and KNN models for the text classification,†Augment. Hum. Res., vol. 5, no. 1, pp. 1–16, 2020.

H. T. Sueno, B. D. Gerardo, and R. P. Medina, “Multi-class document classification using support vector machine (SVM) based on improved Na{"i}ve bayes vectorization technique,†Int. J. Adv. Trends Comput. Sci. Eng., vol. 9, no. 3, 2020.

D. M. J. Lazer et al., “The science of fake news,†Science (80-. )., vol. 359, no. 6380, pp. 1094–1096, 2018, doi: 10.1126/science.aao2998.

A. Choudhary and A. Arora, “Linguistic feature based learning model for fake news detection and classification,†Expert Syst. Appl., vol. 169, p. 114171, 2021, doi: 10.1016/j.eswa.2020.114171.

T. G. Dietterich and Oregon, “Ensemble methods in machine learning. In: International Workshop on Multiple Classifier Models,†Oncogene, vol. 12, no. 2, p. pp 1-15(265-275), 1996.

Tin Kam Ho, “Random Decision Forests Tin Kam Ho Perceptron training,†Proc. 3rd Int. Conf. Doc. Anal. Recognit., pp. 278–282, 1995.

R. Richman and M. V. Wüthrich, “Nagging predictors,†Risks, vol. 8, no. 3, pp. 1–26, 2020, doi: 10.3390/risks8030083.

Z. Jin, J. Shang, Q. Zhu, C. Ling, W. Xie, and B. Qiang, “RFRSF: Employee Turnover Prediction Based on Random Forests and Survival Analysis,†Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 12343 LNCS, pp. 503–515, 2020, doi: 10.1007/978-3-030-62008-0_35.

V. F. Rodriguez-Galiano, B. Ghimire, J. Rogan, M. Chica-Olmo, and J. P. Rigol-Sanchez, “An assessment of the effectiveness of a random forest classifier for land-cover classification,†ISPRS J. Photogramm. Remote Sens., vol. 67, no. 1, pp. 93–104, 2012, doi: 10.1016/j.isprsjprs.2011.11.002.

V. N. Vapnik, Statistics for Engineering and Information Science Springer Science+Business Media, LLC. 2000.

V. N. Vapnik, “An overview of statistical learning theory,†IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 988–999, 1999, doi: 10.1109/72.788640.

H. L. Chen, B. Yang, J. Liu, and D. Y. Liu, “A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis,†Expert Syst. Appl., vol. 38, no. 7, pp. 9014–9022, 2011, doi: 10.1016/j.eswa.2011.01.120.

N. H. Farhat, “Photonit neural networks and learning mathines the role of electron-trapping materials,†IEEE Expert. Syst. their Appl., vol. 7, no. 5, pp. 63–72, 1992, doi: 10.1109/64.163674.

M. Seeger, “Gaussian processes for machine learning.,†Int. J. Neural Syst., vol. 14, no. 2, pp. 69–106, 2004, doi: 10.1142/S0129065704001899.

D. Matić, F. Kulić, M. Pineda-Sánchez, and I. Kamenko, “Support vector machine classifier for diagnosis in electrical machines: Application to broken bar,†Expert Syst. Appl., vol. 39, no. 10, pp. 8681–8689, 2012, doi: 10.1016/j.eswa.2012.01.214.

U. Ravale, N. Marathe, and P. Padiya, “Feature selection based hybrid anomaly intrusion detection system using K Means and RBF kernel function,†Procedia Comput. Sci., vol. 45, no. C, pp. 428–435, 2015, doi: 10.1016/j.procs.2015.03.174.

J. S. Cramer, “The Origins of Logistic Regression,†SSRN Electron. J., 2005, doi: 10.2139/ssrn.360300.

P. Tsangaratos and I. Ilia, “Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: The influence of models complexity and training dataset size,†Catena, vol. 145, pp. 164–179, 2016, doi: 10.1016/j.catena.2016.06.004.

R. Zakharov and P. Dupont, “for Feature Selection,†no. May, 2014, doi: 10.1007/978-3-642-24855-9.

Downloads

Published

2021-10-26