Peningkatan Kinerja K-Means Clustering Berdasarkan Pembobotan Jarak Menggunakan Metode Principal Component Analysis
DOI:
https://doi.org/10.30865/json.v7i2.9394Keywords:
Algoritma K-Means; Pembobotan atribut; Principal Component Analysis; Clustering; Sum of Square Error (SSE).Abstract
Algoritma K-Means memiliki beberapa kelemahan, salah satunya terletak pada model jarak yang digunakan dalam penentuan kemiripan antar data yang memberikan perlakuan yang sama terhadap setiap atribut data, sehingga atribut yang kurang relevan dan memiliki sedikit kontribusi terhadap variasi data dapat memberikan dampak yang cukup berpengaruh terhadap hasil clustering. Hal ini tentu saja dapat menurunkan kinerja algoritma K-Means. Pembobotan atribut merupakan salah satu cara yang dapat digunakan untuk mendapatkan korelasi atribut data terhadap variasi data. Semakin tinggi nilai bobot dari suatu atribut maka semakin besar korelasinya terhadap variasi data, sehingga nilai bobot yang rendah dari suatu atribut tentunya memiliki sedikit kontribusi terhadap variasi data dan dapat memberikan dampak yang cukup berpengaruh terhadap kinerja dan hasil clustering. Pada penelitian ini, metode yang digunakan dalam perhitungan bobot atribut data yaitu Principal Component Analysis (PCA). Untuk melakukan pengujian terhadap metode yang diusulkan, maka penelitian ini menggunakan dataset dari UCI Machine Learning yang terdiri dari 351 data Ionosphere, 4177 data Abalone serta 1096 data kualitas udara dari Laboratorium Udara Kota Pekanbaru dan 120 data kualitas air. Evaluasi kinerja Clustering yang diusulkan berdasarkan nilai Sum of Square Error (SSE). Hasil pengujian pada penelitian ini terlihat bahwa dengan metode yang diusulkan dapat menghasilkan nilai SSE yang signifikan lebih kecil.
References
Y. Chen, P. Hu, and W. Wang, “Improved K-Means Algorithm and its Implementation Based on Mean Shift,” in 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), IEEE, Oct. 2018, pp. 1–5. doi: 10.1109/cisp-bmei.2018.8633100.
A. Danades, D. Pratama, D. Anggraini, and D. Anggriani, “Comparison of accuracy level K-Nearest Neighbor algorithm and Support Vector Machine algorithm in classification water quality status,” in 2016 6th International Conference on System Engineering and Technology (ICSET), IEEE, Oct. 2016, pp. 137–141. doi: 10.1109/icsengt.2016.7849638.
V. Divya and K. N. Devi, “An Efficient Approach to Determine Number of Clusters Using Principal Component Analysis,” in 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), IEEE, Mar. 2018, pp. 1–6. doi: 10.1109/icctct.2018.8551182.
P. Fränti and S. Sieranoja, “How much can K-Means be improved by using better initialization and repeats?,” Pattern Recognition, vol. 93, pp. 95–112, Sep. 2019, doi: 10.1016/j.patcog.2019.04.014.
A. Gondeau, Z. Aouabed, M. Hijri, P. Peres-Neto, and V. Makarenkov, “Object Weighting: A New Clustering Approach to Deal with Outliers and Cluster Overlap in Computational Biology,” IEEE/ACM Trans Comput Biol Bioinform, vol. 18, no. 2, pp. 633–643, Mar. 2021, doi: 10.1109/tcbb.2019.2921577.
V. Kotu and B. Deshpande, “Data Mining Process,” in Predictive Analytics and Data Mining, Elsevier, 2015, pp. 17–36. doi: 10.1016/b978-0-12-801460-8.00002-1.
Y. Li, J. Cai, H. Yang, J. Zhang, and X. Zhao, “A Novel Algorithm for Initial Cluster Center Selection,” IEEE Access, vol. 7, pp. 74683–74693, 2019, doi: 10.1109/access.2019.2921320.
D. Lase and T. S. Alasi, “Penerapan Web untuk Pengolahan Data Pegawai Kantor Desa Menggunakan Bahasa Pemrograman PHP dan UML,” Jurnal Mahajana Informasi, vol. 9, no. 1, pp. 1–6, 2024.
A. A. Nababan, O. S. Sitompul, and Tulus, “Attribute Weighting Based K-Nearest Neighbor Using Gain Ratio,” Journal of Physics: Conference Series, vol. 1007, p. 12007, Apr. 2018, doi: 10.1088/1742-6596/1007/1/012007.
J. Ortiz-Bejar, E. S. Tellez, M. Graff, J. Ortiz-Bejar, J. C. Jacobo, and A. Zamora-Mendez, “Performance Analysis of K-Means Seeding Algorithms,” in 2019 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), IEEE, Nov. 2019, pp. 1–6. doi: 10.1109/ropec48299.2019.9057044.
A. S. Edu, D. Agozie, and M. Agoyi, “Digital security vulnerabilities and threats implications for financial institutions deploying digital technology platforms and application: FMEA and FTOPSIS analysis,” PeerJ Computer Science, vol. 7, pp. 1–26, 2021, doi: 10.7717/PEERJ-CS.658.
“No Title,” in Applied Multivariate Statistical Analysis, Springer Berlin Heidelberg, pp. 93–146. doi: 10.1007/978-3-540-72244-1_4.
D. Tanir and F. Nuriyeva, “On selecting the Initial Cluster Centers in the K-Means Algorithm,” in 2017 IEEE 11th International Conference on Application of Information and Communication Technologies (AICT), IEEE, Sep. 2017, pp. 1–5. doi: 10.1109/icaict.2017.8687081.
C. Xiong, Z. Hua, K. Lv, and X. Li, “An Improved K-Means Text Clustering Algorithm by Optimizing Initial Cluster Centers,” in 2016 7th International Conference on Cloud Computing and Big Data (CCBD), IEEE, Nov. 2016, pp. 265–268. doi: 10.1109/ccbd.2016.059.
M.-S. Yang and K. P. Sinaga, “A Feature-Reduction Multi-View K-Means Clustering Algorithm,” IEEE Access, vol. 7, pp. 114472–114486, 2019, doi: 10.1109/access.2019.2934179.
Z. Sitorus, A. P. U. Siahaan, B. S. Ibrahim, A. O. Sari, and A. Ibezato, “Strategi Digital Marketing Dengan Metode SEO (Search Engine Optimization) untuk UMKM di Desa Klambir 5 Kebun,” 2022.
A. I. Zalukhu, I. Syahputra, Z. Sitorus, and others, “Penerapan Metode Certainty Factor Pada Sistem Pakar Diagnosa Penyakit Gigi Dan Mulut,” Bulletin of Information Technology (BIT), vol. 4, no. 4, pp. 544–553, 2023.
A. Astel, S. Tsakovski, P. Barbieri, and V. Simeonov, “Comparison of self-organizing maps classification approach with cluster and principal components analysis for large environmental data sets,” Water Res, vol. 41, no. 19, pp. 4566–4578, 2007.
M. K. Islam, M. S. Ali, M. S. Miah, M. M. Rahman, M. S. Alam, and M. A. Hossain, “Brain tumor detection in MR image using superpixels, Principal Component Analysis and template based K-Means Clustering algorithm,” Machine Learning with Applications, vol. 5, p. 100044, 2021.
L. Yin, L. Lv, D. Wang, Y. Qu, H. Chen, and W. Deng, “Spectral Clustering approach with K-nearest neighbor and weighted mahalanobis distance for data mining,” Electronics (Basel), vol. 12, no. 15, p. 3284, 2023.
K. Honda, A. Notsu, and H. Ichihashi, “PCA-guided K-Means with variable weighting and its application to document clustering,” in International Conference on Modeling Decisions for Artificial Intelligence, 2009, pp. 282–292.
S. M. Shaharudin, N. Ahmad, and F. Yusof, “Improved cluster partition in Principal Component Analysis guided clustering,” Int J Comput Appl, vol. 75, no. 11, 2013.
S. Surono and R. D. A. Putri, “Optimization of fuzzy c-means Clustering algorithm with combination of minkowski and chebyshev distance using Principal Component Analysis,” International journal of fuzzy systems, vol. 23, no. 1, pp. 139–144, 2021.
S. Mulyaningsih and J. Heikal, “K-Means Clustering Using Principal Component Analysis (PCA) Indonesia Multi-Finance Industry Performance Before and During Covid-19,” APMBA (Asia Pacific Management and Business Application), vol. 11, no. 2, pp. 131–142, 2022.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Jurnal Sistem Komputer dan Informatika (JSON)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).

