Perbandingan Algoritma Random Forest Classifier, Support Vector Machine dan Logistic Regression Clasifier Pada Masalah High Dimension (Studi Kasus: Klasifikasi Fake News)
DOI:
https://doi.org/10.30865/mib.v5i4.3177Keywords:
Fake News, High Dimension, Dataset, RFC, SVM, LRAbstract
Fake news is false information that looks like it is true. News can also be said as a political weapon whose truth cannot be accounted for which is spread intentionally to achieve a certain goal. Classification of news texts requires calculating a method for each word in the document. Each word processed per document means that the number of data dimensions is equal to the number of words. The more the number of words in a document, the more the number of dimensions in each data (high dimension). The large number of dimensions (high dimension), causes the model-making process (training) to be long and the shortcomings are also clearly visible in seeing the similarity of documents (document similarity). The dataset used in this study amounted to 20000 and 17 attributes. The method used in this study uses a Random Forest Classifier (RFC), Support Vector Machine (SVM) and Logistic Regression (LR) with high dimensions and the results of this study are to obtain a comparison of the accuracy values for each method used
References
Y. Y. Chen, S.-P. Yong, and A. Ishak, “Email Hoax Detection System Using Levenshtein Distance Method.,†JCP, vol. 9, no. 2, pp. 441–446, 2014.
C. D. MacDougall, Hoaxes, vol. 465. Dover Publications, 1958.
H. Berghel, “Alt-News and Post-Truths in the" Fake News" Era,†Computer (Long. Beach. Calif)., vol. 50, no. 4, pp. 110–114, 2017.
T. Petkovic, Z. Kostanjcar, and P. Pale, “E-mail system for automatic hoax recognition,†in 27th MIPRO International Conference, 2005, pp. 117–121.
P. Faustini and T. Covões, “Fake news detection using one-class classification,†in 2019 8th Brazilian Conference on Intelligent Systems (BRACIS), 2019, pp. 592–597.
S. Zhang, Y. Wang, and C. Tan, “Research on text classification for identifying fake news,†in 2018 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), 2018, pp. 178–181.
K. Shah, H. Patel, D. Sanghvi, and M. Shah, “A comparative analysis of logistic regression, random forest and KNN models for the text classification,†Augment. Hum. Res., vol. 5, no. 1, pp. 1–16, 2020.
H. T. Sueno, B. D. Gerardo, and R. P. Medina, “Multi-class document classification using support vector machine (SVM) based on improved Na{"i}ve bayes vectorization technique,†Int. J. Adv. Trends Comput. Sci. Eng., vol. 9, no. 3, 2020.
D. M. J. Lazer et al., “The science of fake news,†Science (80-. )., vol. 359, no. 6380, pp. 1094–1096, 2018, doi: 10.1126/science.aao2998.
A. Choudhary and A. Arora, “Linguistic feature based learning model for fake news detection and classification,†Expert Syst. Appl., vol. 169, p. 114171, 2021, doi: 10.1016/j.eswa.2020.114171.
T. G. Dietterich and Oregon, “Ensemble methods in machine learning. In: International Workshop on Multiple Classifier Models,†Oncogene, vol. 12, no. 2, p. pp 1-15(265-275), 1996.
Tin Kam Ho, “Random Decision Forests Tin Kam Ho Perceptron training,†Proc. 3rd Int. Conf. Doc. Anal. Recognit., pp. 278–282, 1995.
R. Richman and M. V. Wüthrich, “Nagging predictors,†Risks, vol. 8, no. 3, pp. 1–26, 2020, doi: 10.3390/risks8030083.
Z. Jin, J. Shang, Q. Zhu, C. Ling, W. Xie, and B. Qiang, “RFRSF: Employee Turnover Prediction Based on Random Forests and Survival Analysis,†Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 12343 LNCS, pp. 503–515, 2020, doi: 10.1007/978-3-030-62008-0_35.
V. F. Rodriguez-Galiano, B. Ghimire, J. Rogan, M. Chica-Olmo, and J. P. Rigol-Sanchez, “An assessment of the effectiveness of a random forest classifier for land-cover classification,†ISPRS J. Photogramm. Remote Sens., vol. 67, no. 1, pp. 93–104, 2012, doi: 10.1016/j.isprsjprs.2011.11.002.
V. N. Vapnik, Statistics for Engineering and Information Science Springer Science+Business Media, LLC. 2000.
V. N. Vapnik, “An overview of statistical learning theory,†IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 988–999, 1999, doi: 10.1109/72.788640.
H. L. Chen, B. Yang, J. Liu, and D. Y. Liu, “A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis,†Expert Syst. Appl., vol. 38, no. 7, pp. 9014–9022, 2011, doi: 10.1016/j.eswa.2011.01.120.
N. H. Farhat, “Photonit neural networks and learning mathines the role of electron-trapping materials,†IEEE Expert. Syst. their Appl., vol. 7, no. 5, pp. 63–72, 1992, doi: 10.1109/64.163674.
M. Seeger, “Gaussian processes for machine learning.,†Int. J. Neural Syst., vol. 14, no. 2, pp. 69–106, 2004, doi: 10.1142/S0129065704001899.
D. Matić, F. Kulić, M. Pineda-Sánchez, and I. Kamenko, “Support vector machine classifier for diagnosis in electrical machines: Application to broken bar,†Expert Syst. Appl., vol. 39, no. 10, pp. 8681–8689, 2012, doi: 10.1016/j.eswa.2012.01.214.
U. Ravale, N. Marathe, and P. Padiya, “Feature selection based hybrid anomaly intrusion detection system using K Means and RBF kernel function,†Procedia Comput. Sci., vol. 45, no. C, pp. 428–435, 2015, doi: 10.1016/j.procs.2015.03.174.
J. S. Cramer, “The Origins of Logistic Regression,†SSRN Electron. J., 2005, doi: 10.2139/ssrn.360300.
P. Tsangaratos and I. Ilia, “Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: The influence of models complexity and training dataset size,†Catena, vol. 145, pp. 164–179, 2016, doi: 10.1016/j.catena.2016.06.004.
R. Zakharov and P. Dupont, “for Feature Selection,†no. May, 2014, doi: 10.1007/978-3-642-24855-9.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).