Random Oversampling, Chi-Square, dan AdaBoost dalam Penanganan Ketidakseimbangan Kelas pada Klasifikasi C5.0
Abstract
Keywords
Full Text:
PDFReferences
C. M. Annur, “Ada 204,7 Juta Pengguna Internet di Indonesia Awal 2022,†Katadata Media Network, 2022. https://databoks.katadata.co.id/datapublish/2022/03/23/ada-2047-juta-pengguna-internet-di-indonesia-awal-2022.
S. Sahney, K. Ghosh, and A. Shrivastava, “Conceptualizing consumer ‘trust’ in online buying behaviour: An empirical inquiry and model development in Indian context,†J. Asia Bus. Stud., vol. 7, no. 3, pp. 278–298, 2013, doi: 10.1108/JABS-Jul-2011-0038.
D. Wagner, S. Chaipoopirutana, and H. Combs, “A Study of Factors Influencing the Online Purchasing Intention toward Online Shopping in Thailand,†AtMA 2019 Proccedings, no. 2013, pp. 277–292, 2019.
M. R. Kabir, F. Bin Ashraf, and R. Ajwad, “Analysis of different predicting model for online shoppers’ purchase intention from empirical data,†2019 22nd Int. Conf. Comput. Inf. Technol., no. March 2020, 2019, doi: 10.1109/ICCIT48885.2019.9038521.
T. P. Novak, D. L. Hoffman, and Y. F. Yung, “Measuring the customer experience in online environments: A structural modeling approach,†Mark. Sci., vol. 19, no. 1, pp. 22–42, 2000, doi: 10.1287/mksc.19.1.22.15184.
E. Buulolo, Data Mining untuk Perguruan Tinggi. Yogyakarta: Deepublish, 2020.
O. Chouat and A. H. Irawan, “Implementation of Data Mining on Online Shop in Indonesia,†in IOP Conference Series: Materials Science and Engineering, 2018, vol. 407, no. 1, doi: 10.1088/1757-899X/407/1/012013.
D. Nofriansyah and G. W. Nurcahyo, Algoritma Data Mining dan Pengujian, 1st ed. Yogyakarta: Deepublish, 2015.
R. T. Vulandari, Data Mining Teori dan Aplikasi Rapidminer, 1st ed. Yogyakarta: Penerbit Gava Media, 2017.
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and regression trees. 2017.
J. R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
N. Japkowicz, “Assessment metrics for imbalanced learning,†in Imbalanced Learning: Foundations, Algorithms, and Applications, 1st ed., Wiley-IEEE Press, Ed. 2013, pp. 187–206.
T. M. Khoshgoftaar, K. Gao, and N. Seliya, “Attribute selection and imbalanced data: Problems in software defect prediction,†in Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI, 2010, vol. 1, doi: 10.1109/ICTAI.2010.27.
J. L. Leevy, T. M. Khoshgoftaar, R. A. Bauder, and N. Seliya, “A survey on addressing high-class imbalance in big data,†J. Big Data, vol. 5, no. 1, 2018, doi: 10.1186/s40537-018-0151-6.
H. He and Y. Ma, Imbalanced learning: Foundations, algorithms, and applications. 2013.
S. Vluymans, “Dealing with imbalanced and weakly labelled data in machine learning using fuzzy and rough set methods,†in Studies in Computational Intelligence, vol. 807, 2019.
G. Douzas and F. Bacao, “Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning,†Expert Syst. Appl., vol. 82, 2017, doi: 10.1016/j.eswa.2017.03.073.
N. Santoso, W. Wibowo, and H. Himawati, “Integration of synthetic minority oversampling technique for imbalanced class,†Indones. J. Electr. Eng. Comput. Sci., 2019, doi: 10.11591/ijeecs.v13.i1.pp102-108.
J. M. Johnson and T. M. Khoshgoftaar, “Survey on deep learning with class imbalance,†J. Big Data, vol. 6, no. 1, 2019, doi: 10.1186/s40537-019-0192-5.
K. Gao, T. Khoshgoftaar, and R. Wald, “Combining feature selection and ensemble learning for software quality estimation,†in Proceedings of the 27th International Florida Artificial Intelligence Research Society Conference (FLAIRS), 2014.
A. Nurmasani and Y. Pristyanto, “Algoritme Stacking untuk Klasifikasi Penyakit Jantung pada Dataset Imbalanced Class,†Pseudocode, vol. 8, no. 1, 2021, doi: 10.33369/pseudocode.8.1.21-26.
J. Ortigosa-Hernández, I. Inza, and J. A. Lozano, “Towards Competitive Classifiers for Unbalanced Classification Problems: A Study on the Performance Scores,†2016, [Online]. Available: http://arxiv.org/abs/1608.08984.
Q. Gu, X. M. Wang, Z. Wu, B. Ning, and C. S. Xin, “An improved SMOTE algorithm based on genetic algorithm for imbalanced data classification,†J. Digit. Inf. Manag., vol. 14, no. 2, 2016.
A. Fernández, S. GarcÃa, M. Galar, R. C. Prati, B. Krawczyk, and F. Herrera, Learning from Imbalanced Data Sets. 2018.
B. W. Yap, K. A. Rani, H. A. A. Rahman, S. Fong, Z. Khairudin, and N. N. Abdullah, “An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets,†in Lecture Notes in Electrical Engineering, 2014, pp. 13–22, doi: 10.1007/978-981-4585-18-7_2.
Y. Sun, M. S. Kamel, A. K. C. Wong, and Y. Wang, “Cost-sensitive boosting for classification of imbalanced data,†Pattern Recognit., vol. 40, no. 12, pp. 3358–3378, 2007, doi: 10.1016/j.patcog.2007.04.009.
D. Tiwari, “Handling Class Imbalance Problem Using Feature Selection,†Int. J. Adv. Res. Comput. Sci. Technol., vol. 2, no. 2, pp. 516–520, 2014.
I. S. Thaseen, C. A. Kumar, and A. Ahmad, “Integrated Intrusion Detection Model Using Chi-Square Feature Selection and Ensemble of Classifiers,†Arab. J. Sci. Eng., vol. 44, no. 4, 2019, doi: 10.1007/s13369-018-3507-5.
A. Thakkar and R. Lohiya, “Attack classification using feature selection techniques: a comparative study,†J. Ambient Intell. Humaniz. Comput., vol. 12, no. 1, 2021, doi: 10.1007/s12652-020-02167-9.
J. Li et al., “Feature selection: A data perspective,†ACM Comput. Surv., vol. 50, no. 6, 2017, doi: 10.1145/3136625.
“Online Shoppers Purchasing Intention Dataset,†UCI Machine Learning Repository, 2018. https://archive.ics.uci.edu/ml/datasets/Online+Shoppers+Purchasing+Intention+Dataset.
C. O. Sakar, S. O. Polat, M. Katircioglu, and Y. Kastro, “Real-time prediction of online shoppers’ purchasing intention using multilayer perceptron and LSTM recurrent neural networks,†Neural Comput. Appl., vol. 31, no. 10, pp. 6893–6908, 2019, doi: 10.1007/s00521-018-3523-0.
H. Kuswanto, N. Sunusi, S. Siswanto, and N. Nirwan, “Application of Resampling and Boosting Methods Using the C5.0 Algorithm,†Proc. Int. Conf. Data Sci. Off. Stat., vol. 2021, no. 1, 2022, doi: 10.34123/icdsos.v2021i1.198.
Y. Xiao and X. Xiao, “An intrusion detection system based on a simplified residual network,†Inf., vol. 10, no. 11, 2019, doi: 10.3390/info10110356.
D. Jain, A. K. Mishra, and S. K. Das, “Machine Learning Based Automatic Prediction of Parkinson’s Disease Using Speech Features,†in Advances in Intelligent Systems and Computing, 2021, vol. 1164, doi: 10.1007/978-981-15-4992-2_33.
L. Gong, S. Jiang, and L. Jiang, “Tackling Class Imbalance Problem in Software Defect Prediction through Cluster-Based Over-Sampling with Filtering,†IEEE Access, vol. 7, 2019, doi: 10.1109/ACCESS.2019.2945858.
S. DEMİR and E. K. ŞAHİN, “Evaluation of Oversampling Methods (OVER, SMOTE, and ROSE) in Classifying Soil Liquefaction Dataset based on SVM, RF, and Naïve Bayes,†Eur. J. Sci. Technol., 2022, doi: 10.31590/ejosat.1077867.
E. Prasetyo, DATA MINING Mengolah Data Menjadi Informasi Menggunakan Matlab. 2014.
D. Kurniawan and D. C. Supriyanto, “Optimasi Algoritma Support Vector Machine (Svm) Menggunakan Adaboost Untuk Penilaian Risiko Kredit,†J. Teknol. Inf., vol. 9, no. 1, 2013.
G. Feng, J. D. Zhang, and S. Shaoyi Liao, “A novel method for combining Bayesian networks, theoretical analysis, and its applications,†Pattern Recognit., vol. 47, no. 5, 2014, doi: 10.1016/j.patcog.2013.12.005.
S. Mulyati, Y. Yulianti, and A. Saifudin, “Penerapan Resampling dan Adaboost untuk Penanganan Masalah Ketidakseimbangan Kelas Berbasis Naϊve Bayes pada Prediksi Churn Pelanggan,†J. Inform. Univ. Pamulang, vol. 2, no. 4, 2017, doi: 10.32493/informatika.v2i4.1440.
R. Hao, X. Xia, S. Shen, and X. Yang, “Bank direct marketing analysis based on ensemble learning,†in Journal of Physics: Conference Series, 2020, vol. 1627, no. 1, doi: 10.1088/1742-6596/1627/1/012026.
X. Wu et al., “Top 10 algorithms in data mining,†Knowl. Inf. Syst., vol. 14, no. 1, pp. 1–37, 2008, doi: 10.1007/s10115-007-0114-2.
Ross Quinlan, “Is See5/C5.0 Better Than C4.5?,†RuleQuest Research, 2017. https://rulequest.com/see5-comparison.html#:~:text=Decision trees%3A faster%2C smaller&text=0 produce trees with similar,are noticeably smaller and C5.
S. Rajeswari and K. Suthendran, “C5.0: Advanced Decision Tree (ADT) classification model for agricultural data analysis on cloud,†Comput. Electron. Agric., vol. 156, pp. 530–539, 2019, doi: 10.1016/j.compag.2018.12.013.
J. H. Joloudari, M. Haderbadi, A. Mashmool, M. Ghasemigol, S. S. Band, and A. Mosavi, “Early detection of the advanced persistent threat attack using performance analysis of deep learning,†IEEE Access, vol. 8, 2020, doi: 10.1109/ACCESS.2020.3029202.
I. C. Dipto, T. Islam, H. M. M. Rahman, and M. A. Rahman, “Comparison of Different Machine Learning Algorithms for the Prediction of Coronary Artery Disease,†J. Data Anal. Inf. Process., vol. 08, no. 02, 2020, doi: 10.4236/jdaip.2020.82003.
DOI: https://doi.org/10.30865/mib.v7i2.5862
Refbacks
- There are currently no refbacks.
Copyright (c) 2023 JURNAL MEDIA INFORMATIKA BUDIDARMA

This work is licensed under a Creative Commons Attribution 4.0 International License.
JURNAL MEDIA INFORMATIKA BUDIDARMA
Universitas Budi Darma
Secretariat: Sisingamangaraja No. 338 Telp 061-7875998
Email: mib.stmikbd@gmail.com

This work is licensed under a Creative Commons Attribution 4.0 International License.