Optimasi Linear Support Vector Machine untuk Deteksi Smishing Multi-Kelas pada Dataset Tidak Seimbang
DOI:
https://doi.org/10.30865/json.v7i2.9299Keywords:
Smishing; Deteksi; SVM Linear; Hybrid Sampling; Ketidakseimbangan KelasAbstract
Serangan smishing (SMS phishing) menghadapi tantangan mendasar dalam deteksi berbasis machine learning akibat ketidakseimbangan distribusi kelas pada dataset dunia nyata, di mana instance kelas minoritas (smishing) justru paling kritis untuk diidentifikasi. Penelitian ini mengusulkan sebuah framework robust yang mengoptimasi Linear Support Vector Machine (SVM) dengan strategi hybrid sampling tiga tingkat untuk klasifikasi multi-kelas pada kondisi data tidak seimbang. Framework yang dikembangkan mengintegrasikan ekstraksi fitur hibrida TF-IDF dan meta-features dengan strategi penanganan ketidakseimbangan data yang komprehensif, yang meliputi Random Oversampling (ROS) untuk kelas minoritas, Random Undersampling (RUS) untuk kelas mayoritas, dan Embedding MixUp untuk augmentasi data level embedding. Optimasi parameter melalui GridSearchCV dengan validasi 5-fold berhasil menentukan konfigurasi optimal SVM Linear (C=0.5). Hasil evaluasi pada test set mendemonstrasikan kemampuan klasifikasi yang tinggi dan seimbang, dengan pencapaian akurasi 96,7% dan F1-macro 87,6%. Kinerja yang konsisten merata pada semua kelas ini tercermin dari recall smishing 84% sambil mempertahankan recall ham 99%. Temuan ini menegaskan bahwa kombinasi Linear SVM dan strategi hybrid sampling berhasil menghasilkan model deteksi smishing yang robust, seimbang, dan siap diimplementasikan dalam skenario dunia nyata.
References
A. F. Mahmud and S. Wirawan, “Phishing Website Detection Using Machine Learning Classification Method,” SISTEMASI, vol. 13, no. 4, pp. 1368–1380, 2024, doi: 10.32520/stmsi.v13i4.3456.
G. Tanbhir, M. F. Shahriyar, K. Shahed, A. M. R. Chy, and M. Al Adnan, “Hybrid Machine Learning Model for Detecting Bangla Smishing Text Using BERT and Character-Level CNN,” in 13th International Conference on Electrical and Computer Engineering (ICECE), 2024, pp. 57–62. doi: 10.1109/ICECE64886.2024.11024872.
Slamet, “Smishing Guard: Strategi Pengembangan Sistem Deteksi dan Respons Ancaman SMS Phishing,” SPIRIT, vol. 17, no. 1, pp. 12–23, 2024, doi: 10.53567/spirit.v17i1.380.
S. Hosseinpour and S. Das, “POSTER: A Multi-Signal Model for Detecting Evasive Smishing,” in Proceedings of the 18th ACM Conference on Security and Privacy in Wireless and Mobile Networks, 2025, pp. 292–293. doi: 10.1145/3734477.3736147.
M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of Machine Learning, 2nd ed. Cambridge: MIT Press, 2018.
J. Schmidt et al., “Improving Machine-Learning Models in Materials Science through Large Datasets,” Mater. Today Phys., vol. 48, no. September, p. 101560, 2024, doi: 10.1016/j.mtphys.2024.101560.
A. H. Salem, S. M. Azzam, O. E. Emam, and A. A. Abohany, “Advancing Cybersecurity: A Comprehensive Review of AI-Driven Detection Technique,” J. Big Data, vol. 11, no. 105, 2024, doi: 10.1186/s40537-024-00957-y.
S. W. Iriananda, R. W. Budiawan, A. Y. Rahman, and I. Istiadi, “Optimasi Klasifikasi Sentimen Komentar Pengguna Game Bergerak Menggunakan SVM, Grid Search dan Kombinasi N-Gram,” J. Teknol. Inf. dan Ilmu Komput., vol. 11, no. 4, pp. 743–752, 2024, doi: 10.25126/jtiik.1148244.
J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, and A. Lopez, “A Comprehensive Survey on Support Vector Machine Classification: Applications, Challenges and Trends,” Neurocomputing, vol. 408, pp. 189–215, 2020, doi: 10.1016/j.neucom.2019.10.118.
S. Mishra and D. Soni, “Smishing Detector: A Security Model to Detect Smishing through SMS Content Analysis and URL Behavior Analysis,” Futur. Gener. Comput. Syst., vol. 108, pp. 803–815, 2020, doi: 10.1016/j.future.2020.03.021.
M. Alshinwan, O. A. Khashan, Z. Alarnaout, S. S. Shreem, A. Y. Shdefat, and N. A. Karim, “A Novel Smishing Defense Approach Based on Meta-Heuristic Optimization Algorithms,” Cybersecurity, vol. 8, no. 1, pp. 8–35, 2025, doi: 10.1186/s42400-024-00328-3.
P. Sun, Z. Wang, L. Jia, and Z. Xu, “SMOTE-kTLNN: A Hybrid Re-sampling Method Based on SMOTE and a Two-Layer Nearest Neighbor Classifier,” Expert Syst. Appl., vol. 238, p. 121848, 2023, doi: 10.1016/j.eswa.2023.121848.
H. I. Hussein, S. A. Anwar, and M. I. Ahmad, “Imbalanced Data Classification Using SVM Based on Improved Simulated Annealing Featuring Synthetic Data Generation and Reduction,” Comput. Mater. Contin., vol. 75, no. 1, pp. 547–564, 2023, doi: 10.32604/cmc.2023.036025.
A. Salehi and M. Khedmati, “Hybrid Clustering Strategies for Effective Oversampling and Undersampling in Multiclass Classification,” Sci. Rep., vol. 15, p. 3460, 2025, doi: 10.1038/s41598-024-84786-2.
A. Ahmad, O. Chaudhari, and R. Chandra, “A Review of Ensemble Learning and Data Augmentation Models for Class Imbalanced Problems : Combination , Implementation and Evaluation,” Expert Syst. Appl., vol. 244, p. 122778, 2024, doi: 10.1016/j.eswa.2023.122778.
R. Asyrofi, “Synthetic-MixUp : A Simple Framework for Imbalanced Text Classification,” 2023 IEEE 12th Glob. Conf. Consum. Electron., pp. 927–929, 2023, doi: 10.1109/GCCE59613.2023.10315313.
H. Sun et al., “Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion,” Neural Networks, vol. 178, no. February, pp. 1–12, 2024, doi: 10.1016/j.neunet.2024.106493.
M. Liang and T. Niu, “Research on Text Classification Techniques Based on Improved TF-IDF Algorithm and LSTM Inputs,” Procedia Comput. Sci., vol. 208, pp. 460–470, 2022, doi: 10.1016/j.procs.2022.10.064.
L. B. V. de Amorim, G. D. C. Cavalcanti, and R. M. O. Cruz, “The Choice of Scaling Technique Matters for Classification Performance,” Appl. Soft Comput., vol. 133, p. 109924, 2023, doi: 10.1016/j.asoc.2022.109924.
G. Kou, H. Chen, and M. A. Hefni, “Improved Hybrid Resampling and Ensemble Model for Imbalance Learning and Credit Evaluation,” J. Manag. Sci. Eng., vol. 7, no. 4, pp. 511–529, 2022, doi: 10.1016/j.jmse.2022.06.002.
C. N. Mohammed and A. M. Ahmed, “A Semantic-Based Model With a Hybrid Feature Engineering Process for Accurate Spam Detection,” J. Electr. Syst. Inf. Technol., vol. 11, p. 26, 2024, doi: 10.1186/s43067-024-00151-3.
B. Li, Y. Hou, and W. Che, “Data Augmentation Approaches in Natural Language Processing: A Survey,” AI Open, vol. 3, pp. 71–90, 2022, doi: 10.1016/j.aiopen.2022.03.001.
M. C. Untoro and M. A. N. M. Yusuf, “Evaluate of Random Undersampling Method and Majority Weighted Minority Oversampling Technique in Resolve Imbalanced Dataset,” IT J. Res. Dev., vol. 8, no. 1, pp. 1–13, 2023, doi: 10.25299/itjrd.2023.12412.
D. S. Cross-validation, “A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning,” vol. 23, no. 4, p. 2333, 2023, doi: 10.3390/s23042333.
H. Wang and Y. Shao, “Sparse and Robust SVM Classifier for Large Scale Classification,” Appl. Intell., vol. 53, no. 16, pp. 19647–19671, 2023, doi: 10.1007/s10489-023-04511-w.
M. Mujahid et al., “Data Oversampling and Imbalanced Datasets: an Investigation of Performance for Machine Learning and Feature Engineering,” J. Big Data, vol. 11, p. 87, 2024, doi: 10.1186/s40537-024-00943-4.
S. Al Hasan et al., “Classification of Multi-Labeled Text Articles with Reuters Dataset using SVM,” in International Conference on Science and Technology (ICOSTECH), 2022, pp. 1–5. doi: 10.1109/ICOSTECH54296.2022.9829153.
M. Soni, Artificial Intelligence. India: Poorav Publications, 2024.
Q. Li, S. Zhao, S. Zhao, and J. Wen, “Logistic Regression Matching Pursuit Algorithm for Text Classification,” Knowledge-Based Syst., vol. 277, p. 110761, 2023, doi: 10.1016/j.knosys.2023.110761.
L. Zhang, “Features Extraction Based on Naive Bayes Algorithm and TF-IDF for news classification,” PLoS One, vol. 20, no. 7, p. e0327347, 2025, doi: 10.1371/journal.pone.0327347.
S. Alsufyani and S. Alajmani, “A Deep Learning for Arabic SMS Phishing Based on URLs Detection,” Int. J. Adv. Comput. Sci. Appl., vol. 16, no. 1, pp. 388–396, 2025, doi: 10.14569/IJACSA.2025.0160138.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Jurnal Sistem Komputer dan Informatika (JSON)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).

