Deteksi Kanker Berdasarkan Data Microarray Menggunakan Metode Naïve Bayes dan Hybrid Feature Selection
DOI:
https://doi.org/10.30865/mib.v4i3.2096Keywords:
Cancer, Microarray, Naïve Bayes, Information Gain, Genetic Algorithm, Hybrid Feature SelectionAbstract
Cancer is a deadly disease that is responsible for 9.6 million death in 2018 based on WHO data so early cancer detection is needed so can be treated immediately and cancer deaths can be reduced. Microarray is technology that can monitor and analyze the expression of cancer genes in microarray data but has high data dimension and small sample so dimensional reductions are needed for the optimal classification process. Dimension reduction can reduce the use of features for the classification process by selecting some influential features. Hybrid method is one dimension reduction by combining Filter method with Wrapper so it gets the both advantage. In this case, researchers combined Naïve Bayes with Hybrid Feature Selection (Information Gain - Genetic Algorithm) on cancer data for microarray Lung Cancer, Ovarian Cancer, Breast Cancer, Colon Tumors, and Prostate Tumors. These data were obtained from Kent-Ridge Biomedical Dataset. The results showed that from 5 data used, 4 data obtained an accuracy between 87-100% while the prostate tumor data obtained the smallest accuracy of 61.14%. The implementation of the feature selection method and the classification of the 5 cancer data above only uses less than 63 features to obtain this accuracyReferences
World Health Organization, “Cancer,†12-Sep-2018. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/cancer. [Accessed: 18-Mar-2020].
M. M. Babu, “An Introduction to Microarray Data Analysis,†Comput. genomics Theory Appl., vol. 225, p. 249, 2004.
S. Michiels, S. Koscielny, and C. Hill, “Interpretation of microarray data in cancer,†British Journal of Cancer. 2007.
N. Almugren and H. Alshamlan, “A survey on hybrid feature selection methods in microarray gene expression data for cancer classification,†IEEE Access. 2019.
N. Sánchez-Maroño, O. Fontenla-Romero, and B. Pérez-Sánchez, “Classification of Microarray Data,†in Microarray Bioinformatics, V. Bolón-Canedo and A. Alonso-Betanzos, Eds. New York, NY: Springer New York, 2019, pp. 185–205.
A. Adiwijaya, “Deteksi Kanker Berdasarkan Klasifikasi Microarray Data,†J. MEDIA Inform. BUDIDARMA, 2018.
Adiwijaya, U. N. Wisesty, E. Lisnawati, A. Aditsania, and D. S. Kusumo, “Dimensionality reduction using Principal Component Analysis for cancer detection based on microarray data classification,†J. Comput. Sci., 2018.
P. Yang, B. B. Zhou, Z. Zhang, and A. Y. Zomaya, “A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data,†BMC Bioinformatics, 2010.
Z. M. Hira and D. F. Gillies, “A review of feature selection and feature extraction methods applied on microarray data,†Adv. Bioinformatics, 2015.
H. Salem, G. Attiya, and N. El-Fishawy, “Classification of human cancer diseases by gene expression profiles,†Appl. Soft Comput. J., 2017.
C. S. Yang, L. Y. Chuang, J. C. Li, and C. H. Yang, “Information gain with chaotic genetic algorithm for gene selection and classification problem,†in Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics, 2008.
C. H. Yang, L. Y. Chuang, and C. H. Yang, “IG-GA: A hybrid filter/wrapper method for feature selection of microarray data,†J. Med. Biol. Eng., 2010.
A. Hasnat and A. U. Molla, “Feature selection in cancer microarray data using multi-objective genetic algorithm combined with correlation coefficient,†in Proceedings of IEEE International Conference on Emerging Technological Trends in Computing, Communications and Electrical Engineering, ICETT 2016, 2017.
W. Astuti and A. Adiwijaya, “Principal Component Analysis Sebagai Ekstraksi Fitur Data Microarray Untuk Deteksi Kanker Berbasis Linear Discriminant Analysis,†J. MEDIA Inform. BUDIDARMA, 2019.
M. S. Mubarok, A. Adiwijaya, and M. D. Aldhi, “Aspect-based sentiment analysis to review products using Naïve Bayes,†in AIP Conference Proceedings, 2017.
R. Aziz, C. K. Verma, and N. Srivastava, “A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data,†Genomics Data, 2016.
E. Alpaydin, “Introduction to Machine Learning Ethem Alpaydin.,†Introd. to Mach. Learn. Third Ed., 2014.
M. D. Purbolaksono, K. C. Widiastuti, M. S. Mubarok, Adiwijaya, and F. A. Ma’ruf, “Implementation of mutual information and bayes theorem for classification microarray data,†in Journal of Physics: Conference Series, 2018.
A. C. Pradana, Adiwijaya, and A. Aditsania, “Implementing binary particle swarm optimization and C4.5 decision tree for cancer detection based on microarray data classification,†in Journal of Physics: Conference Series, 2019.
H. Aydadenta and Adiwijaya, “A clustering approach for feature selection in microarray data classification using random forest,†J. Inf. Process. Syst., 2018.
C. Arun Kumar, M. P. Sooraj, and S. Ramakrishnan, “A Comparative Performance Evaluation of Supervised Feature Selection Algorithms on Microarray Datasets,†in Procedia Computer Science, 2017.
I. Jain, V. K. Jain, and R. Jain, “An improved Binary Particle Swarm Optimization (iBPSO) for Gene Selection and Cancer Classification using DNA Microarrays,†in 2018 Conference on Information and Communication Technology, CICT 2018, 2018.
Mabarti, I., Aditsania, A., "Implementation of Minimum Redundancy Maximum Relevance (MRMR) and Genetic Algorithm (GA) for Microarray Data Classification with C4.5 Decision Tree". Journal of Data Science and Its Applications, 3(1), 2020.
Purnomoputra, R.B., Adiwijaya, A. and Wisesty, U.N., 2019. Sentiment Analysis of Movie Review using Naïve Bayes Method with Gini Index Feature Selection. Journal of Data Science and Its Applications, 2(2), pp.85-94.
Ma’ruf, F. A., Adiwijaya & Wisesty, U. N. "Analysis of the influence of Minimum Redundancy Maximum Relevance as dimensionality reduction method on cancer classification based on microarray data using Support Vector Machine classifier". In Journal of Physics: Conference Series (Vol. 1192, No. 1, p. 012011). IOP Publishing, 2019.
Manik, A., Adiwijaya, A., & Utama, D. Q. "Classification of Electrocardiogram Signals using Principal Component Analysis and Levenberg Marquardt Backpropagation for Detection Ventricular Tachyarrhythmia".Journal of Data Science and Its Applications, 2(1), 78-87, 2019
Daeli, N.O.F, Adiwijaya. Sentiment analysis on movie reviews using Information gain and K-nearest neighbor. Journal of Data Science and Its Applications, 3(1), 2020.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).