Penerapan Word2Vec dan SVM dengan Hyperparameter Tuning untuk Deteksi Phishing
DOI:
https://doi.org/10.30865/jurikom.v12i3.8729Keywords:
Hyperparameter Tuning, Machine Learning, Phishing, Support Vector Machine, Word2VecAbstract
The advancement of information technology in today’s digital age takes place very rapidly from one time to another. This phenomenon is accompanied by increasing cybersecurity threats like phishing. Phishing links are often designed with uniform resource locator (URL) structures that appear convincing and are difficult to distinguish from genuine links. This research proposes a word-to-vector (Word2Vec) and Support Vector Machine (SVM) approach with hyperparameter tuning where Word2Vec is a word embedding technique used to create a word vector representation of a particular URL, SVM is used as a machine learning (ML) approach used in this research, and hyperparameter tuning is used as a technique to find the best combination of parameters to produce an optimal SVM model in detecting phishing. The purpose of this research is to compare the performance between SVM and XGBoost models that have been optimized and deploy ML models into a prediction system using the Streamlit framework to detect phishing based on input made by users in the form of certain URLs. The findings of this study indicated that the SVM model performed very well compared to the XGBoost model, with precision, recall, f1-score, and accuracy values of about 99.84% for SVM. On the other hand, the XGBoost model recorded precision, recall, f1-score, and accuracy values of about 99.70% each. Thus, the SVM model is the optimal model to detect phishing precisely and accurately.
References
I. Yurita, M. K. Ramadhan, and M. Candra, “Pengaruh Kemajuan Teknologi Terhadap Perkembangan Tindak Pidana Cybercrime,” Jurnal Hukum, Legalita, vol. 5, no. 2, pp. 143–155, 2023.
V. Aprelia Windarni, A. Ferdita Nugraha, S. Tri Atmaja Ramadhani, D. Anisa Istiqomah, F. Mahananing Puri, and A. Setiawan, “Deteksi Website Phishing Menggunakan Teknik Filter pada Model Machine Learning,” Information System Journal (INFOS), vol. 6, no. 1, pp. 39–43, 2023.
J. Pande, Introduction to Cyber Security. Uttarakhand Open University, 2017. [Online]. Available: http://uou.ac.in
D. Prayama, Yuhefizar, and Amelia Yolanda, “Protokol HTTPS, Apakah Benar-benar Aman?,” Journal of Applied Computer Science and Technology, vol. 2, no. 1, pp. 7–11, Jun. 2021, doi: 10.52158/jacost.v2i1.118.
A. D. Harahap, D. Juardi, and A. S. Y. Irawan, “Rancang Bangun Sistem Pendeteksi Link Phishing Menggunakan Algoritma Random Forest Berbasis Web,” Jurnal Informatika dan Teknik Elektro Terapan (JITET), vol. 12, no. 3, pp. 2677–2686, Aug. 2024, doi: 10.23960/jitet.v12i3.4858.
N. Stevanovi´c, “Character and Word Embeddings for Phishing Email Detection,” Computing and Informatics, vol. 41, no. 5, pp. 1337–1357, 2022, doi: 10.31577/cai.
S. Khomsah, “Sentiment Analysis On YouTube Comments Using Word2Vec and Random Forest Sentimen Analisis pada Opini YouTube Menggunakan Word2Vec dan Random Forest,” Jurnal Informatika dan Teknologi Informasi, vol. 18, no. 1, pp. 61–72, 2021, doi: 10.31515/telematika.v18i1.4493.
M. T. Pilehvar and J. Camacho-Collados, Embeddings in Natural Language Processing Theory and Advances in Vector Representation of Meaning. 2021. doi: https://doi.org/10.1007/978-3-031-02177-0.
M. Vebriani and W. Yustanti, “Klasifikasi Deteksi Link Phising DANA Kaget Menggunakan Metode Support Vector Machine Berbasis Website,” Journal of Informatics and Computer Science, vol. 06, 2024, [Online]. Available: https://danakagetvezridd.
F. F. Tampinongkol, A. R. Kamila, A. C. Wardhana, A. W. C. Kusuma, and D. Revaldo, “Implementation of Random Forest Classification and Support Vector Machine Algorithms for Phishing Link Detection,” Journal of Informatics Information System Software Engineering and Applications (INISTA), vol. 7, no. 1, pp. 127–137, Dec. 2024, doi: 10.20895/inista.v7i1.1588.
K. Sadaf, “Phishing Website Detection using XGBoost and Catboost Classifiers,” in 2023 International Conference on Smart Computing and Application (ICSCA), IEEE, Feb. 2023, pp. 1–6. doi: 10.1109/ICSCA57840.2023.10087829.
F. R. Lumbanraja et al., “Implementasi Support Vector Machine dalam Memprediksi Harga Rumah pada Perumahan di Kota Bandar Lampung,” Jurnal Pepadun, vol. 2, no. 3, pp. 327–335, 2021.
B. Filemon, V. C. Mawardi, and N. J. Perdana, “Penggunaan Metode Support Vector Machine untuk Klasifikasi Sentimen E-Wallet,” Jurnal Ilmu Komputer dan Sistem Informasi, vol. 10, no. 1, pp. 1–6, Mar. 2022, doi: 10.24912/jiksi.v10i1.17824.
T. Meisya Permata Aulia, N. Arifin, and R. Mayasari, “Perbandingan Kernel Support Vector Machine (SVM) dalam Penerapan Analisis Sentimen Vaksinisasi Covid-19,” Science and Information Technology (SINTECH), vol. 4, no. 2, pp. 139–145, 2021, [Online]. Available: https://doi.org/10.31598
H. S. Wicaksana, R. Kusumaningrum, and R. Gernowo, “Determining community happiness index with transformers and attention-based deep learning,” IAES International Journal of Artificial Intelligence (IJ-AI), vol. 13, no. 2, p. 1753, Jun. 2024, doi: 10.11591/ijai.v13.i2.pp1753-1761.
B. Hakim, “Analisa Sentimen Data Text Preprocessing Pada Data Mining Dengan Menggunakan Machine Learning,” JBASE - Journal of Business and Audit Information Systems, vol. 4, no. 2, Aug. 2021, doi: 10.30813/jbase.v4i2.3000.
P. Ayuningtyas and H. Tantyoko, “Perbandingan Metode Word2vec Model Skipgram pada Ulasan Aplikasi Linkaja menggunakan Algoritma Bidirectional LSTM dan Support Vector Machine,” Jurnal Sistem dan Teknologi Informasi (JustIN), vol. 12, no. 1, p. 189, Jan. 2024, doi: 10.26418/justin.v12i1.72530.
E. Septiana Pane and C. Caroline, “Optimalisasi Evaluasi Pelaksanaan Pelatihan Melalui Analisis Sentimen Otomatis Dengan Model Text Classification,” in Prosiding PITNAS Widyaiswara , 2024, pp. 141–154.
E. Andreas and W. Widhiarso, “Klasifikasi Penyakit Mata Katarak Menggunakan Convolutional Neural Network dengan Arsitektur Inception V3,” in The 2nd MDP Student Conference 2023, 2023, pp. 107–113. [Online]. Available: https://www.kaggle.com/jr2ngb/cataractdataset
Y. A. Singgalen, “Analisis Performa Algoritma NBC, DT, SVM dalam Klasifikasi Data Ulasan Pengunjung Candi Borobudur Berbasis CRISP-DM,” Building of Informatics, Technology and Science (BITS), vol. 4, no. 3, Dec. 2022, doi: 10.47065/bits.v4i3.2766.
N. B. Putri and A. W. Wijayanto, “Analisis Komparasi Algoritma Klasifikasi Data Mining Dalam Klasifikasi Website Phishing,” Komputika : Jurnal Sistem Komputer, vol. 11, no. 1, pp. 59–66, Jan. 2022, doi: 10.34010/komputika.v11i1.4350.
L. Palupi, E. Ihsanto, and F. Nugroho, “Analisis Validasi dan Evaluasi Model Deteksi Objek Varian Jahe Menggunakan Algoritma Yolov5,” Journal of Information System Research (JOSH), vol. 5, no. 1, pp. 234–241, Oct. 2023, doi: 10.47065/josh.v5i1.4380.
A. Putranto, N. L. Azizah, and I. R. I. Astutik, “Sistem Prediksi Penyakit Jantung Berbasis Web Menggunakan Metode SVM dan Framework Streamlit,” KESATRIA: Jurnal Penerapan Sistem Informasi (Komputer & Manajemen), vol. 4, no. 2, pp. 442–452, 2023, [Online]. Available: https://archive.ics.uci.edu/ml/datasets/heart+disease
A. F. Mahmud and S. Wirawan, “Deteksi Phishing Website menggunakan Machine Learning Metode Klasifikasi,” Sistemasi: Jurnal Sistem Informasi, vol. 13, no. 4, pp. 1368–1380, 2024, [Online]. Available: http://sistemasi.ftik.unisi.ac.id
E. S. Shombot, G. Dusserre, R. Bestak, and N. B. Ahmed, “An application for predicting phishing attacks: A case of implementing a support vector machine learning model,” Cyber Security and Applications, vol. 2, Jan. 2024, doi: 10.1016/j.csa.2024.100036.
H. S. Wafa, A. I. Hadiana, and F. R. Umbara, “Prediksi Penyakit Diabetes Menggunakan Algoritma Support Vector Machine (SVM),” Informatics and Digital Expert (INDEX), vol. 4, no. 1, pp. 40–45, 2022, [Online]. Available: https://e-journal.unper.ac.id/index.php/informatics
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Hilman Singgih Wicaksana, Khairul Huda

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.



