Improvisasi Teknik Oversampling MWMOTE Untuk Penanganan Data Tidak Seimbang

Pramana Yoga Saputra; Moch Zawaruddin Abdullah; Annisa Puspa Kirana

doi:10.30865/mib.v5i2.2811

Authors

Pramana Yoga Saputra Politeknik Negeri Malang, Malang
Moch Zawaruddin Abdullah Politeknik Negeri Malang, Malang http://orcid.org/0000-0002-8857-9744
Annisa Puspa Kirana Politeknik Negeri Malang, Malang

DOI:

https://doi.org/10.30865/mib.v5i2.2811

Keywords:

Improvising, Oversampling, MWMOTE, Imbalance Data, Data Mining

Abstract

Imbalance data is a condition which there is a distinction in the quantity of data that results withinside the majority class (classes with very many members) and minority class (classes with very few members). It can complicate the classification process since the machine learning algorithm method is designed to classify already balanced data. The oversampling process technique is used to resolve data imbalance by applying synthetic data to the minority class in such a manner that it has the same volume of data as the majority class. MWMOTE is an oversampling technique that generates synthetic data based on members of the minority class clusters that are close to the majority class. This approach is capable of generating synthetic data well. The resulting synthesis data remains in the nearby majority region and too dense on the border of the cluster. It is hence permitting the resulting synthetic data to go into the majority class classification. This study is objectives to improve the process of generating synthetic data on MWMOTE so that the resulting data is extensively dispensed withinside the minority class. The outcomes of the test show that the proposed method is capable of enhancing the classification performance for KNN and C4.5 Decision Tree classification sequentially by 0.46% and 0.96% compared to MWMOTE

Author Biographies

Pramana Yoga Saputra, Politeknik Negeri Malang, Malang

Jurusan Teknik Informatika

Moch Zawaruddin Abdullah, Politeknik Negeri Malang, Malang

Jurusan Teknik Informatika

Annisa Puspa Kirana, Politeknik Negeri Malang, Malang

Jurusan Teknik Informatika

References

S. Barua, M. M. Islam, X. Yao, and K. Murase, â€œMWMOTE - Majority weighted minority oversampling technique for imbalanced data set learning,â€ IEEE Trans. Knowl. Data Eng., vol. 26, no. 2, pp. 405â€“425, 2014.

J. Gong and H. Kim, â€œRHSBoost: Improving classification performance in imbalance data,â€ Comput. Stat. Data Anal., vol. 111, pp. 1â€“13, 2017.

I. Fakhruzi, â€œAn artificial neural network with bagging to address imbalance datasets on clinical prediction,â€ 2018 Int. Conf. Inf. Commun. Technol. ICOIACT 2018, vol. 2018-Janua, no. 1, pp. 895â€“898, 2018.

K. NapieraÅ‚a, â€œImproving Rule Classifiers For Imbalanced Data,â€ Poznan University of Technology, 2012.

P. Phoungphol, â€œA Classification Framework for Imbalanced Data,â€ Georgia State University, 2013.

M. C. Untoro and J. L. Buliali, â€œPenanganan imbalance class data laboratorium kesehatan dengan Majority Weighted Minority Oversampling Technique,â€ Regist. J. Ilm. Teknol. Sist. Inf., vol. 4, no. 1, p. 23, 2018.

Nitesh V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, â€œSMOTE: Synthetic Minority Over-sampling Technique,â€ J. Artif. Intell. Res., vol. 16, no. Sept. 28, pp. 321â€“357, 2002.

C. Seiffert, T. M. Khoshgoftaar, and J. Van Hulse, â€œHybrid sampling for imbalanced data,â€ Integr. Comput. Aided. Eng., vol. 16, no. 3, pp. 193â€“210, 2009.

H. Han, W. Wang, and B. Mao, â€œBorderline-SMOTE : A New Over-Sampling Method in,â€ Int. Conf. Intell. Comput. ICIC 2005, Hefei, China, August 23-26, 2005, Proceedings, Part I, pp. 878â€“887, 2005.

H. He, Y. Bai, E. A. Garcia, and S. Li, â€œADASYN: Adaptive synthetic sampling approach for imbalanced learning,â€ Proc. Int. Jt. Conf. Neural Networks, no. 3, pp. 1322â€“1328, 2008.

S. Guo, D. Guo, L. Chen, and Q. Jiang, â€œA centroid-based gene selection method for microarray data classification,â€ J. Theor. Biol., vol. 400, pp. 32â€“41, 2016.

J. A. S. Almeida, L. M. S. Barbosa, A. A. C. C. Pais, and S. J. Formosinho, â€œImproving hierarchical cluster analysis: A new method with outlier detection and automatic clustering,â€ Chemom. Intell. Lab. Syst., vol. 87, no. 2, pp. 208â€“217, 2007.

J. A. SÃ¡ez, B. Krawczyk, and M. WoÅºniak, â€œAnalyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets,â€ Pattern Recognit., vol. 57, pp. 164â€“178, 2016.

E. Frank, M. A. Hall, and I. H. Witten, The WEKA Workbench. Online Appendix for â€œData Mining: Practical Machine learning Tools and Techniques,â€ Fourth Edi. Morgan Kaufmann, 2016.

H. Brawijaya, S. Samudi, and S. Widodo, â€œKomparasi Algoritma K-Nearest Neighbor dan Naiive Bayes Pada Pengobatan Penyakit Kutil Menggunakan Cryotheraphy,â€ JUITA J. Inform., vol. 7, no. 2, p. 93, 2019.

S. Sukamto, Y. Adriyani, and R. Aulia, â€œPrediksi Kelompok UKT Mahasiswa Menggunakan Algoritma K-Nearest Neighbor,â€ JUITA J. Inform., vol. 8, no. 1, p. 121, 2020.

N. YE, Data Mining Theories, Algorithms, and Examples, vol. 16, no. 4. 2013.

I. Sutoyo, â€œImplementasi Algoritma Decision Tree Untuk Klasifikasi Data Peserta Didik,â€ J. Pilar Nusa Mandiri, vol. 14, no. 2, p. 217, 2018.

G. S. Mahendra and K. Y. E. Aryanto, â€œSPK Penentuan Lokasi ATM Menggunakan Metode AHP dan SAW,â€ J. Nas. Teknol. dan Sist. Inf., vol. 05, no. 01, pp. 49â€“56, 2019.

J. N. Mandrekar, â€œReceiver operating characteristic curve in diagnostic test assessment,â€ J. Thorac. Oncol., vol. 5, no. 9, pp. 1315â€“1316, 2010.

M. E. Rice and G. T. Harris, â€œComparing effect sizes in follow-up studies: ROC area, Cohenâ€™s d, and r,â€ Law Hum. Behav., vol. 29, no. 5, pp. 615â€“620, 2005.

Improvisasi Teknik Oversampling MWMOTE Untuk Penanganan Data Tidak Seimbang

Authors

DOI:

Keywords:

Abstract

Author Biographies

Pramana Yoga Saputra, Politeknik Negeri Malang, Malang

Moch Zawaruddin Abdullah, Politeknik Negeri Malang, Malang

Annisa Puspa Kirana, Politeknik Negeri Malang, Malang

References

Downloads

Published

How to Cite

Issue

Section

License

Menu Utama

flagcounter

template

statcounter

rji

terindex