Analisis Dan Implementasi Algoritma Active Fuzzy Constrained Clustering Untuk Pengelompokan Dokumen
DOI:
https://doi.org/10.30865/jurikom.v9i2.3980Keywords:
Document, Active Fuzzy Constrained Clustering, Pairwise Constraint, BBC News Archives, Confusion MatrixAbstract
Text document clustering techniques automatically become an important research in which volume of text document via digital media is growing rapidly. This technique is known as document clustering. Document clustering is a method of grouping documents based on their similarity. For groupping these documents, one of the clustering algorithms is used, namely Active Fuzzy Constrained Clustering (AFCC), which combines fuzzy and semi-supervised clustering methods where text documents as a bag of words will be calculated with the value of meaningful words using the Vector Space Model. (VSM). The AFCC algorithm is identified by the use of pairwise constraint and centroid in its cluster. The input documents tested in the research are a collection of documents in the BBC News Archives. Based on the research that has been done, using the parameters of the maximum number of clusters, the maximum number of constraints per iteration and the maximum number of iterations, the AFCC algorithm results in grouping text documents that are news article. Performance measurement of clustering results in this research uses the Confusion Matrix approach, which can be generated with an average precision and recall value of 0.53, and an accuracy value of 0.52
References
W. Setiawan, “Era Digital dan Tantangannya,†Semin. Nas. Pendidik., p. 1, 2017.
J. E. van Engelen and H. H. Hoos, “A survey on semi-supervised learning,†Mach. Learn., vol. 109, no. 2, pp. 373–440, 2020, doi: 10.1007/s10994-019-05855-6.
L. Akritidis and P. Bozanis, “A supervised machine learning classification algorithm for research articles,†Proc. ACM Symp. Appl. Comput., no. June 2019, pp. 115–120, 2013, doi: 10.1145/2480362.2480388.
N. Amruthnath and T. Gupta, “A research study on unsupervised machine learning algorithms for early fault detection in predictive maintenance,†2018 5th Int. Conf. Ind. Eng. Appl. ICIEA 2018, no. April, pp. 355–361, 2018, doi: 10.1109/IEA.2018.8387124.
A. Cholaquidis, R. Fraiman, and M. Sued, On semi-supervised learning, vol. 29, no. 4. 2020. doi: 10.1007/s11749-019-00690-2.
R. A. Pramadhanty, “STUDI DAN IMPLEMENTASI ACTIVE FUZZY CONSTRAINED,†Tugas Akhir, 2018, [Online]. Available: https://digilib.itb.ac.id/index.php/gdl/view/8980
J. H. Kusuma and K. Maulana, “Analisis Active Fuzzy Constrained Clustering Dengan Menggunakan Vektor Model Untuk Pengelompokan,†pp. 175–182, 2011.
R. R. Syoer and Y. Wahyudin, “Studi Kasus Pengelompokkan Desa di Provinsi Kalimantan Timur ( CLUSTER ANALYSIS WITH FUZZY CLUSTERING ALGORITHM Case Study Grouping Villages in Kalimantan Timur Province ),†Stat. Ahli Madya, BPS Provinsi Kalimantan Timur, pp. 1–11, 2021.
L. Li, J. M. Garibaldi, D. He, and M. Wang, “Semi-supervised fuzzy clustering with feature discrimination,†PLoS One, vol. 10, no. 9, pp. 1–13, 2015, doi: 10.1371/journal.pone.0131160.
D. Greene, “ML Resources Dataset : BBC Dataset : BBCSport,†pp. 3–4, 2006.
A. Kondas, “Text data classification with BBC news article dataset,†pp. 1–10, 2019.
L. Pham, C. Baume, Q. Kong, T. Hussain, W. Wang, and M. Plumbley, “An Audio-Based Deep Learning Framework For BBC Television Programme Classification,†Eur. Signal Process. Conf., vol. 2021-August, pp. 56–60, 2021, doi: 10.23919/EUSIPCO54536.2021.9616310.
D. H. K. Al-Khafaji and A. T. Habeeb, “Efficient Algorithms for Preprocessing and Stemming of Tweets in a Sentiment Analysis System,†IOSR J. Comput. Eng., vol. 19, no. 3, pp. 44–50, 2017, doi: 10.9790/0661-1903024450.
S. Fauziah, D. N. Sulistyowati, and T. Asra, “Optimasi Algoritma Vector Space Model Dengan Algoritma K-Nearest Neighbour Pada Pencarian Judul Artikel Jurnal,†J. Pilar Nusa Mandiri, vol. 15, no. 1, pp. 21–26, 2019, doi: 10.33480/pilar.v15i1.27.
J. Arora, M. Tushir, and R. Kashyap, “EAI Endorsed Transactions Improving Semi-Supervised Classification using Clustering,†vol. 7, no. 2019, pp. 1–9, 2019.
Z. Wang, S.-S. Wang, L. Bai, W.-S. Wang, and Y.-H. Shao, “Fuzzy Discriminant Clustering with Fuzzy Pairwise Constraints,†vol. X, no. X, pp. 1–15, 2021, [Online]. Available: http://arxiv.org/abs/2104.08546
Z. Cebeci and C. Cebeci, “A fast algorithm to initialize cluster centroids in fuzzy clustering applications,†Inf., vol. 11, no. 9, pp. 1–15, 2020, doi: 10.3390/INFO11090446.
C. Xiong, D. M. Johnson, and J. J. Corso, “Active Clustering with Model-Based Uncertainty Reduction,†IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 1, pp. 5–17, 2017, doi: 10.1109/TPAMI.2016.2539965.
A. U. Fitriyadi, “Algoritma K-Means dan K-Medoids Analisis Algoritma K-Means dan K-Medoids Untuk Clustering Data Kinerja Karyawan Pada Perusahaan Perumahan Nasional,†Kilat, vol. 10, no. 1, pp. 157–168, 2021, doi: 10.33322/kilat.v10i1.1174.
Karsito and S. Susanti, “Klasifikasi Kelayakan Peserta Pengajuan Kredit Rumah Dengan Algoritma Naïve Bayes Di Perumahan Azzura Residencia,†J. Teknol. Pelita Bangsa, vol. 9, pp. 43–48, 2019.
R. K. Dinata, S. Safwandi, N. Hasdyna, and N. Azizah, “Analisis K-Means Clustering pada Data Sepeda Motor,†INFORMAL Informatics J., vol. 5, no. 1, p. 10, 2020, doi: 10.19184/isj.v5i1.17071.



