Comparison of Support Vector Machine and Random Forest Method on Static Analysis Windows Portable Executable (PE) Malware Detection

Hazim Ismail; Rio Guntur Utomo; Marastika Wicaksono Aji Bawono

doi:10.30865/mib.v8i1.7110

Authors

Hazim Ismail Telkom University, Bandung
Rio Guntur Utomo Telkom University, Bandung
Marastika Wicaksono Aji Bawono Telkom University, Bandung

DOI:

https://doi.org/10.30865/mib.v8i1.7110

Keywords:

Malware Detection, Support Vector Machine, Random Forest, Machine Learning, Windows Portable Executable

Abstract

Malware has emerged as a significant concern for computer system security, as it spreads rapidly and adversely affects system performance. Detecting malware has become crucial, and one of the methods utilized is Machine Learning classification, which learns the characteristics of an application without executing it. In this study, the author evaluates the efficacy of malware detection in the static analysis of Windows Portable Executable (PE) files using the Support Vector Machine (SVM) and Random Forest algorithms. The author employs a dataset containing both malware-related PE files and safe applications to train the SVM and Random Forest models to classify PE files as either malware or safe. The objective is to determine the most effective machine learning algorithm for malware detection in PE files. The research compares the performance of both algorithms to identify the superior one for malware detection. The results indicate that the Random Forest algorithm achieves an impressive accuracy of 98.53%, while the SVM algorithm performs slightly lower with an accuracy of 97.14%.

References

J. Singh and J. Singh, “A survey on machine learning-based malware detection in executable files,” Journal of Systems Architecture, vol. 112. Elsevier B.V., Jan. 01, 2021. doi: 10.1016/j.sysarc.2020.101861.

A. Kumar, K. S. Kuppusamy, and G. Aghila, “A learning model to detect maliciousness of portable executable using integrated feature set,” Journal of King Saud University - Computer and Information Sciences, vol. 31, no. 2, pp. 252–265, Apr. 2019, doi: 10.1016/j.jksuci.2017.01.003.

R. Chanajitt, B. Pfahringer, and H. M. Gomes, “Combining Static and Dynamic Analysis to Improve Machine Learning-based Malware Classification,” in 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), IEEE, Oct. 2021, pp. 1–10. doi: 10.1109/DSAA53316.2021.9564144.

A. G. Kakisim, M. Nar, N. Carkaci, and I. Sogukpinar, “Analysis and Evaluation of Dynamic Feature-Based Malware Detection Methods,” 2019, pp. 247–258. doi: 10.1007/978-3-030-12942-2_19.

A. Shalaginov, S. Banin, A. Dehghantanha, and K. Franke, “Machine learning aided static malware analysis: A survey and tutorial,” in Advances in Information Security, vol. 70, Springer New York LLC, 2018, pp. 7–45. doi: 10.1007/978-3-319-73951-9_2.

L. Yang, A. Ciptadi, I. Laziuk, A. Ahmadzadeh, and G. Wang, “BODMAS: An Open Dataset for Learning based Temporal Analysis of PE Malware,” in 2021 IEEE Security and Privacy Workshops (SPW), IEEE, May 2021, pp. 78–84. doi: 10.1109/SPW53761.2021.00020.

R. Sihwail, K. Omar, and K. A. Z. Ariffin, “A survey on malware analysis techniques: Static, dynamic, hybrid and memory analysis,” Int J Adv Sci Eng Inf Technol, vol. 8, no. 4–2, pp. 1662–1671, 2018, doi: 10.18517/ijaseit.8.4-2.6827.

M. Ijaz, M. H. Durad, and M. Ismail, “Static and Dynamic Malware Analysis Using Machine Learning,” in 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST), IEEE, Jan. 2019, pp. 687–691. doi: 10.1109/IBCAST.2019.8667136.

N. Balram, G. Hsieh, and C. McFall, “Static Malware Analysis Using Machine Learning Algorithms on APT1 Dataset with String and PE Header Features,” in 2019 International Conference on Computational Science and Computational Intelligence (CSCI), IEEE, Dec. 2019, pp. 90–95. doi: 10.1109/CSCI49370.2019.00022.

M. S. Akhtar and T. Feng, “Malware Analysis and Detection Using Machine Learning Algorithms,” Symmetry (Basel), vol. 14, no. 11, p. 2304, Nov. 2022, doi: 10.3390/sym14112304.

H. S. Anderson and P. Roth, “EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models,” Apr. 2018.

J. Luengo, D. García-Gil, S. Ramírez-Gallego, S. García, and F. Herrera, Big Data Preprocessing. Cham: Springer International Publishing, 2020. doi: 10.1007/978-3-030-39105-8.

P. Mishra, A. Biancolillo, J. M. Roger, F. Marini, and D. N. Rutledge, “New data preprocessing trends based on ensemble of multiple preprocessing techniques,” TrAC Trends in Analytical Chemistry, vol. 132, p. 116045, Nov. 2020, doi: 10.1016/j.trac.2020.116045.

B. Vrigazova, “The proportion for splitting data into training and test set for the bootstrap in classification problems,” Business Systems Research: International Journal of the Society for Advancing Innovation and Research in Economy, vol. 12, no. 1, pp. 228–242, 2021.

Z. Cui, F. Xue, X. Cai, Y. Cao, G. Wang, and J. Chen, “Detection of Malicious Code Variants Based on Deep Learning,” IEEE Trans Industr Inform, vol. 14, no. 7, pp. 3187–3196, Jul. 2018, doi: 10.1109/TII.2018.2822680.

A. Chatzimparmpas, R. M. Martins, K. Kucher, and A. Kerren, “StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics,” IEEE Trans Vis Comput Graph, vol. 27, no. 2, pp. 1547–1557, Feb. 2021, doi: 10.1109/TVCG.2020.3030352.

I. M. Mubaroq and E. B. Setiawan, “The Effect of Information Gain Feature Selection for Hoax Identification in Twitter Using Classification Method Support Vector Machine,” Indonesia Journal on Computing (Indo-JC), vol. 5, no. 2, pp. 107–118, 2020.

D. Maulina and R. Sagara, “Klasifikasi artikel hoax menggunakan support vector machine linear dengan pembobotan term frequency–Inverse document frequency,” Jurnal Mantik Penusa, vol. 2, no. 1, 2018.

C. Irawan, T. Mantoro, and M. A. Ayu, “Malware Detection and Classification Model Using Machine Learning Random Forest Approach,” in 2021 IEEE 7th International Conference on Computing, Engineering and Design (ICCED), IEEE, Aug. 2021, pp. 1–5. doi: 10.1109/ICCED53389.2021.9664858.

D. Kuswanto, Husni, and M. R. Anjad, “Application of Improved Random Forest Method and C4.5 Algorithm as Classifier to Ransomware Detection Based on the Frequency Appearance of API Calls,” in 2021 IEEE 7th Information Technology International Seminar (ITIS), IEEE, Oct. 2021, pp. 1–6. doi: 10.1109/ITIS53497.2021.9791836.

J. Xu, Y. Zhang, and D. Miao, “Three-way confusion matrix for classification: A measure driven view,” Inf Sci (N Y), vol. 507, pp. 772–794, Jan. 2020, doi: 10.1016/j.ins.2019.06.064.

Comparison of Support Vector Machine and Random Forest Method on Static Analysis Windows Portable Executable (PE) Malware Detection

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Menu Utama

flagcounter

template

statcounter

rji

terindex