Handling Unbalanced Data Sets Using DBMUTE and NearMiss Methods to Improve Classification Performance of Yeast Data Sets

 (*)Bima Mahardika Wirawan Mail (Telkom University, Bandung, Indonesia)
 Mahendra Dwifebri Purbolaksono (Telkom University, Bandung, Indonesia)
 Fhira Nhita (Telkom University, Bandung, Indonesia)

(*) Corresponding Author

Submitted: June 9, 2023; Published: July 23, 2023

Abstract

Yeast vacuole biogenesis was chosen as a model system for organelle assembly because most vacuole functions can be used for vegetative cell growth. Therefore it is possible to generate an extensive collection of mutants with defects in unbalanced vacuole assembly. With this in mind, we must find the structural balance of data in yeast. Imbalanced data is when there is an unbalanced distribution of data classes and the number of data classes is either more or lower than the number of other data classes. Our method uses the f1score performance matrix method and the balanced accuracy on DBMUTE and NearMiss undersampling. Previously, only a few studies explained the results of using a performance matrix and balanced accuracy. Then, find out the performance results of the f1 score and balanced accuracy and get the best score from the yeast datasets. In the study, a comparison between the imbalanced datasets using the undersampling method. Furthermore, to obtain the performance matrix results, use the f1 score and balance accuracy. After testing five yeast datasets, we performed an average f1 score and balance accuracy with the highest average NearMiss f1 score of 62.23% and the highest average balanced accuracy of 78.59%.

Keywords


Imbalance Data; DBMUTE; NearMiss; Support Vector Machine; Undersampling

Full Text:

PDF


Article Metrics

Abstract view : 266 times
PDF - 100 times

References

G. Qadir, “Yeast a magical microorganism in the wastewater treatment,” ~ 1498 ~ J. Pharmacogn. Phytochem., vol. 8, no. 4, pp. 1498–1500, 2019.

R. Siringoringo, “Klasifikasi data tidak Seimbang menggunakan algoritma SMOTE dan k-nearest neighbor,” J. ISD, vol. 3, no. 1, pp. 44–49, 2018.

H. Patel, D. S. Rajput, O. P. Stan, and L. C. Miclea, “A new fuzzy adaptive algorithm to classify imbalanced data,” Comput. Mater. Contin., vol. 70, no. 1, pp. 73–89, 2021, doi: 10.32604/cmc.2022.017114.

L. Cao and H. Shen, “Imbalanced data classification using improved clustering algorithm and under-sampling method,” Proc. - 2019 20th Int. Conf. Parallel Distrib. Comput. Appl. Technol. PDCAT 2019, pp. 358–363, 2019, doi: 10.1109/PDCAT46702.2019.00071.

M. C. Untoro, M. Praseptiawan, M. Widianingsih, I. F. Ashari, A. Afriansyah, and Oktafianto, “Evaluation of Decision Tree, K-NN, Naive Bayes and SVM with MWMOTE on UCI Dataset,” J. Phys. Conf. Ser., vol. 1477, no. 3, 2020, doi: 10.1088/1742-6596/1477/3/032005.

B. Huang, Y. Zhu, Z. Wang, and Z. Fang, “Imbalanced Data Classification Algorithm Based on Clustering and SVM,” J. Circuits, Syst. Comput., vol. 30, no. 2, 2021, doi: 10.1142/S0218126621500365.

L. Yu, R. Zhou, L. Tang, and R. Chen, “A DBN-based resampling SVM ensemble learning paradigm for credit classification with imbalanced data,” Appl. Soft Comput. J., vol. 69, pp. 192–202, 2018, doi: 10.1016/j.asoc.2018.04.049.

T. M. Alam et al., “An investigation of credit card default prediction in the imbalanced datasets,” IEEE Access, vol. 8, pp. 201173–201198, 2020, doi: 10.1109/ACCESS.2020.3033784.

X. Zheng, M. Wang, and J. Ordieres-Meré, “Comparison of data preprocessing approaches for applying deep learning to human activity recognition in the context of industry 4.0,” Sensors (Switzerland), vol. 18, no. 7, 2018, doi: 10.3390/s18072146.

Y. Luo, X. Cai, Y. Zhang, J. Xu, and X. Yuan, “Multivariate time series imputation with generative adversarial networks,” Adv. Neural Inf. Process. Syst., vol. 2018-December, no. NeurIPS, pp. 1596–1607, 2018.

M. Z. H. Jesmeen et al., “A survey on cleaning dirty data using machine learning paradigm for big data analytics,” Indones. J. Electr. Eng. Comput. Sci., vol. 10, no. 3, pp. 1234–1243, 2018, doi: 10.11591/ijeecs.v10.i3.pp1234-1243.

P. Purwono, A. Wirasto, and K. Nisa, “Komparasi Algoritma Machine Learning Untuk Klasifikasi Kelompok Obat,” Sisfotenika, vol. 11, no. 2, p. 196, 2021.

D. Deng, “DBSCAN Clustering Algorithm Based on Density,” Proc. - 2020 7th Int. Forum Electr. Eng. Autom. IFEEA 2020, pp. 949–953, 2020, doi: 10.1109/IFEEA51475.2020.00199.

B. Mirzaei, B. Nikpour, and H. Nezamabadi-Pour, “An under-sampling technique for imbalanced data classification based on DBSCAN algorithm,” 8th Iran. Jt. Congr. Fuzzy Intell. Syst. CFIS 2020, pp. 21–26, 2020, doi: 10.1109/CFIS49607.2020.9238718.

F. E. Botchey, Z. Qin, and K. Hughes-Lartey, “Mobile money fraud prediction-A cross-case analysis on the efficiency of support vector machines, gradient boosted decision trees, and Naïve Bayes algorithms,” Inf., vol. 11, no. 8, 2020, doi: 10.3390/INFO11080383.

S. Huang, C. A. I. Nianguang, P. Penzuti Pacheco, S. Narandes, Y. Wang, and X. U. Wayne, “Applications of support vector machine (SVM) learning in cancer genomics,” Cancer Genomics and Proteomics, vol. 15, no. 1, pp. 41–51, 2018, doi: 10.21873/cgp.20063.

J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, and A. Lopez, “A comprehensive survey on support vector machine classification: Applications, challenges and trends,” Neurocomputing, vol. 408, no. xxxx, pp. 189–215, 2020, doi: 10.1016/j.neucom.2019.10.118.

A. Fahmi Sabani, Adiwijaya, and W. Astuti, “Analisis Sentimen Review Film pada Website Rotten Tomatoes Menggunakan Metode SVM Dengan Mengimplementasikan Fitur Extraction Word2Vec,” e-Proceeding Eng., vol. 9, no. 3, p. 1800, 2022.

N. Tri Romadloni, I. Santoso, and S. Budilaksono, “Perbandingan Metode Naive Bayes, Knn Dan Decision Tree Terhadap Analisis Sentimen Transportasi Krl Commuter Line,” J. IKRA-ITH Inform., vol. 3, no. 2, pp. 1–9, 2019.

F. Ratnawati, “Implementasi Algoritma Naive Bayes Terhadap Analisis Sentimen Opini Film Pada Twitter,” INOVTEK Polbeng - Seri Inform., vol. 3, no. 1, p. 50, 2018, doi: 10.35314/isi.v3i1.335.

N. Hendrastuty et al., “Analisis Sentimen Masyarakat Terhadap Program Kartu Prakerja Pada Twitter Dengan Metode Support Vector Machine,” J. Inform. J. Pengemb. IT, vol. 6, no. 3, pp. 150–155, 2021.

R. Mehmood and A. Selwal, Fingerprint biometric template security schemes: Attacks and countermeasures, vol. 597. 2020.

V. S. Spelmen and R. Porkodi, “A Review on Handling Imbalanced Data,” Proc. 2018 Int. Conf. Curr. Trends Towar. Converging Technol. ICCTCT 2018, pp. 1–11, 2018, doi: 10.1109/ICCTCT.2018.8551020.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Handling Unbalanced Data Sets Using DBMUTE and NearMiss Methods to Improve Classification Performance of Yeast Data Sets

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 JURNAL MEDIA INFORMATIKA BUDIDARMA

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.



JURNAL MEDIA INFORMATIKA BUDIDARMA
STMIK Budi Darma
Secretariat: Sisingamangaraja No. 338 Telp 061-7875998
Email: mib.stmikbd@gmail.com

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.