Improvisasi Teknik Oversampling MWMOTE Untuk Penanganan Data Tidak Seimbang

Pramana Yoga Saputra, Moch Zawaruddin Abdullah, Annisa Puspa Kirana

Abstract


Imbalance data is a condition which there is a distinction in the quantity of data that results withinside the majority class (classes with very many members) and minority class (classes with very few members). It can complicate the classification process since the machine learning algorithm method is designed to classify already balanced data. The oversampling process technique is used to resolve data imbalance by applying synthetic data to the minority class in such a manner that it has the same volume of data as the majority class. MWMOTE is an oversampling technique that generates synthetic data based on members of the minority class clusters that are close to the majority class. This approach is capable of generating synthetic data well. The resulting synthesis data remains in the nearby majority region and too dense on the border of the cluster. It is hence permitting the resulting synthetic data to go into the majority class classification. This study is objectives to improve the process of generating synthetic data on MWMOTE so that the resulting data is extensively dispensed withinside the minority class. The outcomes of the test show that the proposed method is capable of enhancing the classification performance for KNN and C4.5 Decision Tree classification sequentially by 0.46% and 0.96% compared to MWMOTE

Keywords


Improvising; Oversampling; MWMOTE; Imbalance Data; Data Mining

Full Text:

PDF

References


S. Barua, M. M. Islam, X. Yao, and K. Murase, “MWMOTE - Majority weighted minority oversampling technique for imbalanced data set learning,†IEEE Trans. Knowl. Data Eng., vol. 26, no. 2, pp. 405–425, 2014.

J. Gong and H. Kim, “RHSBoost: Improving classification performance in imbalance data,†Comput. Stat. Data Anal., vol. 111, pp. 1–13, 2017.

I. Fakhruzi, “An artificial neural network with bagging to address imbalance datasets on clinical prediction,†2018 Int. Conf. Inf. Commun. Technol. ICOIACT 2018, vol. 2018-Janua, no. 1, pp. 895–898, 2018.

K. Napierała, “Improving Rule Classifiers For Imbalanced Data,†Poznan University of Technology, 2012.

P. Phoungphol, “A Classification Framework for Imbalanced Data,†Georgia State University, 2013.

M. C. Untoro and J. L. Buliali, “Penanganan imbalance class data laboratorium kesehatan dengan Majority Weighted Minority Oversampling Technique,†Regist. J. Ilm. Teknol. Sist. Inf., vol. 4, no. 1, p. 23, 2018.

Nitesh V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,†J. Artif. Intell. Res., vol. 16, no. Sept. 28, pp. 321–357, 2002.

C. Seiffert, T. M. Khoshgoftaar, and J. Van Hulse, “Hybrid sampling for imbalanced data,†Integr. Comput. Aided. Eng., vol. 16, no. 3, pp. 193–210, 2009.

H. Han, W. Wang, and B. Mao, “Borderline-SMOTE : A New Over-Sampling Method in,†Int. Conf. Intell. Comput. ICIC 2005, Hefei, China, August 23-26, 2005, Proceedings, Part I, pp. 878–887, 2005.

H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,†Proc. Int. Jt. Conf. Neural Networks, no. 3, pp. 1322–1328, 2008.

S. Guo, D. Guo, L. Chen, and Q. Jiang, “A centroid-based gene selection method for microarray data classification,†J. Theor. Biol., vol. 400, pp. 32–41, 2016.

J. A. S. Almeida, L. M. S. Barbosa, A. A. C. C. Pais, and S. J. Formosinho, “Improving hierarchical cluster analysis: A new method with outlier detection and automatic clustering,†Chemom. Intell. Lab. Syst., vol. 87, no. 2, pp. 208–217, 2007.

J. A. Sáez, B. Krawczyk, and M. Woźniak, “Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets,†Pattern Recognit., vol. 57, pp. 164–178, 2016.

E. Frank, M. A. Hall, and I. H. Witten, The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine learning Tools and Techniques,†Fourth Edi. Morgan Kaufmann, 2016.

H. Brawijaya, S. Samudi, and S. Widodo, “Komparasi Algoritma K-Nearest Neighbor dan Naiive Bayes Pada Pengobatan Penyakit Kutil Menggunakan Cryotheraphy,†JUITA J. Inform., vol. 7, no. 2, p. 93, 2019.

S. Sukamto, Y. Adriyani, and R. Aulia, “Prediksi Kelompok UKT Mahasiswa Menggunakan Algoritma K-Nearest Neighbor,†JUITA J. Inform., vol. 8, no. 1, p. 121, 2020.

N. YE, Data Mining Theories, Algorithms, and Examples, vol. 16, no. 4. 2013.

I. Sutoyo, “Implementasi Algoritma Decision Tree Untuk Klasifikasi Data Peserta Didik,†J. Pilar Nusa Mandiri, vol. 14, no. 2, p. 217, 2018.

G. S. Mahendra and K. Y. E. Aryanto, “SPK Penentuan Lokasi ATM Menggunakan Metode AHP dan SAW,†J. Nas. Teknol. dan Sist. Inf., vol. 05, no. 01, pp. 49–56, 2019.

J. N. Mandrekar, “Receiver operating characteristic curve in diagnostic test assessment,†J. Thorac. Oncol., vol. 5, no. 9, pp. 1315–1316, 2010.

M. E. Rice and G. T. Harris, “Comparing effect sizes in follow-up studies: ROC area, Cohen’s d, and r,†Law Hum. Behav., vol. 29, no. 5, pp. 615–620, 2005.




DOI: https://doi.org/10.30865/mib.v5i2.2811

Refbacks

  • There are currently no refbacks.


Copyright (c) 2021 JURNAL MEDIA INFORMATIKA BUDIDARMA

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.



JURNAL MEDIA INFORMATIKA BUDIDARMA
Universitas Budi Darma
Secretariat: Sisingamangaraja No. 338 Telp 061-7875998
Email: mib.stmikbd@gmail.com

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.