Improvisasi Teknik Oversampling MWMOTE Untuk Penanganan Data Tidak Seimbang
DOI:
https://doi.org/10.30865/mib.v5i2.2811Keywords:
Improvising, Oversampling, MWMOTE, Imbalance Data, Data MiningAbstract
Imbalance data is a condition which there is a distinction in the quantity of data that results withinside the majority class (classes with very many members) and minority class (classes with very few members). It can complicate the classification process since the machine learning algorithm method is designed to classify already balanced data. The oversampling process technique is used to resolve data imbalance by applying synthetic data to the minority class in such a manner that it has the same volume of data as the majority class. MWMOTE is an oversampling technique that generates synthetic data based on members of the minority class clusters that are close to the majority class. This approach is capable of generating synthetic data well. The resulting synthesis data remains in the nearby majority region and too dense on the border of the cluster. It is hence permitting the resulting synthetic data to go into the majority class classification. This study is objectives to improve the process of generating synthetic data on MWMOTE so that the resulting data is extensively dispensed withinside the minority class. The outcomes of the test show that the proposed method is capable of enhancing the classification performance for KNN and C4.5 Decision Tree classification sequentially by 0.46% and 0.96% compared to MWMOTEReferences
S. Barua, M. M. Islam, X. Yao, and K. Murase, “MWMOTE - Majority weighted minority oversampling technique for imbalanced data set learning,†IEEE Trans. Knowl. Data Eng., vol. 26, no. 2, pp. 405–425, 2014.
J. Gong and H. Kim, “RHSBoost: Improving classification performance in imbalance data,†Comput. Stat. Data Anal., vol. 111, pp. 1–13, 2017.
I. Fakhruzi, “An artificial neural network with bagging to address imbalance datasets on clinical prediction,†2018 Int. Conf. Inf. Commun. Technol. ICOIACT 2018, vol. 2018-Janua, no. 1, pp. 895–898, 2018.
K. Napierała, “Improving Rule Classifiers For Imbalanced Data,†Poznan University of Technology, 2012.
P. Phoungphol, “A Classification Framework for Imbalanced Data,†Georgia State University, 2013.
M. C. Untoro and J. L. Buliali, “Penanganan imbalance class data laboratorium kesehatan dengan Majority Weighted Minority Oversampling Technique,†Regist. J. Ilm. Teknol. Sist. Inf., vol. 4, no. 1, p. 23, 2018.
Nitesh V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,†J. Artif. Intell. Res., vol. 16, no. Sept. 28, pp. 321–357, 2002.
C. Seiffert, T. M. Khoshgoftaar, and J. Van Hulse, “Hybrid sampling for imbalanced data,†Integr. Comput. Aided. Eng., vol. 16, no. 3, pp. 193–210, 2009.
H. Han, W. Wang, and B. Mao, “Borderline-SMOTE : A New Over-Sampling Method in,†Int. Conf. Intell. Comput. ICIC 2005, Hefei, China, August 23-26, 2005, Proceedings, Part I, pp. 878–887, 2005.
H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,†Proc. Int. Jt. Conf. Neural Networks, no. 3, pp. 1322–1328, 2008.
S. Guo, D. Guo, L. Chen, and Q. Jiang, “A centroid-based gene selection method for microarray data classification,†J. Theor. Biol., vol. 400, pp. 32–41, 2016.
J. A. S. Almeida, L. M. S. Barbosa, A. A. C. C. Pais, and S. J. Formosinho, “Improving hierarchical cluster analysis: A new method with outlier detection and automatic clustering,†Chemom. Intell. Lab. Syst., vol. 87, no. 2, pp. 208–217, 2007.
J. A. Sáez, B. Krawczyk, and M. Woźniak, “Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets,†Pattern Recognit., vol. 57, pp. 164–178, 2016.
E. Frank, M. A. Hall, and I. H. Witten, The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine learning Tools and Techniques,†Fourth Edi. Morgan Kaufmann, 2016.
H. Brawijaya, S. Samudi, and S. Widodo, “Komparasi Algoritma K-Nearest Neighbor dan Naiive Bayes Pada Pengobatan Penyakit Kutil Menggunakan Cryotheraphy,†JUITA J. Inform., vol. 7, no. 2, p. 93, 2019.
S. Sukamto, Y. Adriyani, and R. Aulia, “Prediksi Kelompok UKT Mahasiswa Menggunakan Algoritma K-Nearest Neighbor,†JUITA J. Inform., vol. 8, no. 1, p. 121, 2020.
N. YE, Data Mining Theories, Algorithms, and Examples, vol. 16, no. 4. 2013.
I. Sutoyo, “Implementasi Algoritma Decision Tree Untuk Klasifikasi Data Peserta Didik,†J. Pilar Nusa Mandiri, vol. 14, no. 2, p. 217, 2018.
G. S. Mahendra and K. Y. E. Aryanto, “SPK Penentuan Lokasi ATM Menggunakan Metode AHP dan SAW,†J. Nas. Teknol. dan Sist. Inf., vol. 05, no. 01, pp. 49–56, 2019.
J. N. Mandrekar, “Receiver operating characteristic curve in diagnostic test assessment,†J. Thorac. Oncol., vol. 5, no. 9, pp. 1315–1316, 2010.
M. E. Rice and G. T. Harris, “Comparing effect sizes in follow-up studies: ROC area, Cohen’s d, and r,†Law Hum. Behav., vol. 29, no. 5, pp. 615–620, 2005.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).