Handling Unbalanced Data Sets Using DBMUTE and NearMiss Methods to Improve Classification Performance of Yeast Data Sets
DOI:
https://doi.org/10.30865/mib.v7i3.6306Keywords:
Imbalance Data, DBMUTE, NearMiss, Support Vector Machine, UndersamplingAbstract
Yeast vacuole biogenesis was chosen as a model system for organelle assembly because most vacuole functions can be used for vegetative cell growth. Therefore it is possible to generate an extensive collection of mutants with defects in unbalanced vacuole assembly. With this in mind, we must find the structural balance of data in yeast. Imbalanced data is when there is an unbalanced distribution of data classes and the number of data classes is either more or lower than the number of other data classes. Our method uses the f1score performance matrix method and the balanced accuracy on DBMUTE and NearMiss undersampling. Previously, only a few studies explained the results of using a performance matrix and balanced accuracy. Then, find out the performance results of the f1 score and balanced accuracy and get the best score from the yeast datasets. In the study, a comparison between the imbalanced datasets using the undersampling method. Furthermore, to obtain the performance matrix results, use the f1 score and balance accuracy. After testing five yeast datasets, we performed an average f1 score and balance accuracy with the highest average NearMiss f1 score of 62.23% and the highest average balanced accuracy of 78.59%.
References
G. Qadir, “Yeast a magical microorganism in the wastewater treatment,†~ 1498 ~ J. Pharmacogn. Phytochem., vol. 8, no. 4, pp. 1498–1500, 2019.
R. Siringoringo, “Klasifikasi data tidak Seimbang menggunakan algoritma SMOTE dan k-nearest neighbor,†J. ISD, vol. 3, no. 1, pp. 44–49, 2018.
H. Patel, D. S. Rajput, O. P. Stan, and L. C. Miclea, “A new fuzzy adaptive algorithm to classify imbalanced data,†Comput. Mater. Contin., vol. 70, no. 1, pp. 73–89, 2021, doi: 10.32604/cmc.2022.017114.
L. Cao and H. Shen, “Imbalanced data classification using improved clustering algorithm and under-sampling method,†Proc. - 2019 20th Int. Conf. Parallel Distrib. Comput. Appl. Technol. PDCAT 2019, pp. 358–363, 2019, doi: 10.1109/PDCAT46702.2019.00071.
M. C. Untoro, M. Praseptiawan, M. Widianingsih, I. F. Ashari, A. Afriansyah, and Oktafianto, “Evaluation of Decision Tree, K-NN, Naive Bayes and SVM with MWMOTE on UCI Dataset,†J. Phys. Conf. Ser., vol. 1477, no. 3, 2020, doi: 10.1088/1742-6596/1477/3/032005.
B. Huang, Y. Zhu, Z. Wang, and Z. Fang, “Imbalanced Data Classification Algorithm Based on Clustering and SVM,†J. Circuits, Syst. Comput., vol. 30, no. 2, 2021, doi: 10.1142/S0218126621500365.
L. Yu, R. Zhou, L. Tang, and R. Chen, “A DBN-based resampling SVM ensemble learning paradigm for credit classification with imbalanced data,†Appl. Soft Comput. J., vol. 69, pp. 192–202, 2018, doi: 10.1016/j.asoc.2018.04.049.
T. M. Alam et al., “An investigation of credit card default prediction in the imbalanced datasets,†IEEE Access, vol. 8, pp. 201173–201198, 2020, doi: 10.1109/ACCESS.2020.3033784.
X. Zheng, M. Wang, and J. Ordieres-Meré, “Comparison of data preprocessing approaches for applying deep learning to human activity recognition in the context of industry 4.0,†Sensors (Switzerland), vol. 18, no. 7, 2018, doi: 10.3390/s18072146.
Y. Luo, X. Cai, Y. Zhang, J. Xu, and X. Yuan, “Multivariate time series imputation with generative adversarial networks,†Adv. Neural Inf. Process. Syst., vol. 2018-December, no. NeurIPS, pp. 1596–1607, 2018.
M. Z. H. Jesmeen et al., “A survey on cleaning dirty data using machine learning paradigm for big data analytics,†Indones. J. Electr. Eng. Comput. Sci., vol. 10, no. 3, pp. 1234–1243, 2018, doi: 10.11591/ijeecs.v10.i3.pp1234-1243.
P. Purwono, A. Wirasto, and K. Nisa, “Komparasi Algoritma Machine Learning Untuk Klasifikasi Kelompok Obat,†Sisfotenika, vol. 11, no. 2, p. 196, 2021.
D. Deng, “DBSCAN Clustering Algorithm Based on Density,†Proc. - 2020 7th Int. Forum Electr. Eng. Autom. IFEEA 2020, pp. 949–953, 2020, doi: 10.1109/IFEEA51475.2020.00199.
B. Mirzaei, B. Nikpour, and H. Nezamabadi-Pour, “An under-sampling technique for imbalanced data classification based on DBSCAN algorithm,†8th Iran. Jt. Congr. Fuzzy Intell. Syst. CFIS 2020, pp. 21–26, 2020, doi: 10.1109/CFIS49607.2020.9238718.
F. E. Botchey, Z. Qin, and K. Hughes-Lartey, “Mobile money fraud prediction-A cross-case analysis on the efficiency of support vector machines, gradient boosted decision trees, and Naïve Bayes algorithms,†Inf., vol. 11, no. 8, 2020, doi: 10.3390/INFO11080383.
S. Huang, C. A. I. Nianguang, P. Penzuti Pacheco, S. Narandes, Y. Wang, and X. U. Wayne, “Applications of support vector machine (SVM) learning in cancer genomics,†Cancer Genomics and Proteomics, vol. 15, no. 1, pp. 41–51, 2018, doi: 10.21873/cgp.20063.
J. Cervantes, F. Garcia-Lamont, L. RodrÃguez-Mazahua, and A. Lopez, “A comprehensive survey on support vector machine classification: Applications, challenges and trends,†Neurocomputing, vol. 408, no. xxxx, pp. 189–215, 2020, doi: 10.1016/j.neucom.2019.10.118.
A. Fahmi Sabani, Adiwijaya, and W. Astuti, “Analisis Sentimen Review Film pada Website Rotten Tomatoes Menggunakan Metode SVM Dengan Mengimplementasikan Fitur Extraction Word2Vec,†e-Proceeding Eng., vol. 9, no. 3, p. 1800, 2022.
N. Tri Romadloni, I. Santoso, and S. Budilaksono, “Perbandingan Metode Naive Bayes, Knn Dan Decision Tree Terhadap Analisis Sentimen Transportasi Krl Commuter Line,†J. IKRA-ITH Inform., vol. 3, no. 2, pp. 1–9, 2019.
F. Ratnawati, “Implementasi Algoritma Naive Bayes Terhadap Analisis Sentimen Opini Film Pada Twitter,†INOVTEK Polbeng - Seri Inform., vol. 3, no. 1, p. 50, 2018, doi: 10.35314/isi.v3i1.335.
N. Hendrastuty et al., “Analisis Sentimen Masyarakat Terhadap Program Kartu Prakerja Pada Twitter Dengan Metode Support Vector Machine,†J. Inform. J. Pengemb. IT, vol. 6, no. 3, pp. 150–155, 2021.
R. Mehmood and A. Selwal, Fingerprint biometric template security schemes: Attacks and countermeasures, vol. 597. 2020.
V. S. Spelmen and R. Porkodi, “A Review on Handling Imbalanced Data,†Proc. 2018 Int. Conf. Curr. Trends Towar. Converging Technol. ICCTCT 2018, pp. 1–11, 2018, doi: 10.1109/ICCTCT.2018.8551020.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).