Peningkatan Kinerja K-Means Clustering Berdasarkan Pembobotan Jarak Menggunakan Metode Principal Component Analysis

Authors

  • Alfian Agusnady Universitas Sumatera Utara
  • Opim Salim Sitompul Sitompul Universitas Sumatera Utara
  • Tulus Tulus Universitas Sumatera Utara

DOI:

https://doi.org/10.30865/json.v7i2.9394

Keywords:

Algoritma K-Means; Pembobotan atribut; Principal Component Analysis; Clustering; Sum of Square Error (SSE).

Abstract

Algoritma K-Means memiliki beberapa kelemahan, salah satunya terletak pada model jarak yang digunakan dalam penentuan kemiripan antar data yang memberikan perlakuan yang sama terhadap setiap atribut data, sehingga atribut yang kurang relevan dan memiliki sedikit kontribusi terhadap variasi data dapat memberikan dampak yang cukup berpengaruh terhadap hasil clustering. Hal ini tentu saja dapat menurunkan kinerja algoritma K-Means. Pembobotan atribut merupakan salah satu cara yang dapat digunakan untuk mendapatkan korelasi atribut data terhadap variasi data. Semakin tinggi nilai bobot dari suatu atribut maka semakin besar korelasinya terhadap variasi data, sehingga nilai bobot yang rendah dari suatu atribut tentunya memiliki sedikit kontribusi terhadap variasi data dan dapat memberikan dampak yang cukup berpengaruh terhadap kinerja dan hasil clustering. Pada penelitian ini, metode yang digunakan dalam perhitungan bobot atribut data yaitu Principal Component Analysis (PCA). Untuk melakukan pengujian terhadap metode yang diusulkan, maka penelitian ini menggunakan dataset dari UCI Machine Learning yang terdiri dari 351 data Ionosphere, 4177 data Abalone serta 1096 data kualitas udara dari Laboratorium Udara Kota Pekanbaru dan 120 data kualitas air. Evaluasi kinerja  Clustering yang diusulkan berdasarkan nilai Sum of Square Error (SSE). Hasil pengujian pada penelitian ini terlihat bahwa dengan metode yang diusulkan dapat menghasilkan nilai SSE yang signifikan lebih kecil.

References

Y. Chen, P. Hu, and W. Wang, “Improved K-Means Algorithm and its Implementation Based on Mean Shift,” in 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), IEEE, Oct. 2018, pp. 1–5. doi: 10.1109/cisp-bmei.2018.8633100.

A. Danades, D. Pratama, D. Anggraini, and D. Anggriani, “Comparison of accuracy level K-Nearest Neighbor algorithm and Support Vector Machine algorithm in classification water quality status,” in 2016 6th International Conference on System Engineering and Technology (ICSET), IEEE, Oct. 2016, pp. 137–141. doi: 10.1109/icsengt.2016.7849638.

V. Divya and K. N. Devi, “An Efficient Approach to Determine Number of Clusters Using Principal Component Analysis,” in 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), IEEE, Mar. 2018, pp. 1–6. doi: 10.1109/icctct.2018.8551182.

P. Fränti and S. Sieranoja, “How much can K-Means be improved by using better initialization and repeats?,” Pattern Recognition, vol. 93, pp. 95–112, Sep. 2019, doi: 10.1016/j.patcog.2019.04.014.

A. Gondeau, Z. Aouabed, M. Hijri, P. Peres-Neto, and V. Makarenkov, “Object Weighting: A New Clustering Approach to Deal with Outliers and Cluster Overlap in Computational Biology,” IEEE/ACM Trans Comput Biol Bioinform, vol. 18, no. 2, pp. 633–643, Mar. 2021, doi: 10.1109/tcbb.2019.2921577.

V. Kotu and B. Deshpande, “Data Mining Process,” in Predictive Analytics and Data Mining, Elsevier, 2015, pp. 17–36. doi: 10.1016/b978-0-12-801460-8.00002-1.

Y. Li, J. Cai, H. Yang, J. Zhang, and X. Zhao, “A Novel Algorithm for Initial Cluster Center Selection,” IEEE Access, vol. 7, pp. 74683–74693, 2019, doi: 10.1109/access.2019.2921320.

D. Lase and T. S. Alasi, “Penerapan Web untuk Pengolahan Data Pegawai Kantor Desa Menggunakan Bahasa Pemrograman PHP dan UML,” Jurnal Mahajana Informasi, vol. 9, no. 1, pp. 1–6, 2024.

A. A. Nababan, O. S. Sitompul, and Tulus, “Attribute Weighting Based K-Nearest Neighbor Using Gain Ratio,” Journal of Physics: Conference Series, vol. 1007, p. 12007, Apr. 2018, doi: 10.1088/1742-6596/1007/1/012007.

J. Ortiz-Bejar, E. S. Tellez, M. Graff, J. Ortiz-Bejar, J. C. Jacobo, and A. Zamora-Mendez, “Performance Analysis of K-Means Seeding Algorithms,” in 2019 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), IEEE, Nov. 2019, pp. 1–6. doi: 10.1109/ropec48299.2019.9057044.

A. S. Edu, D. Agozie, and M. Agoyi, “Digital security vulnerabilities and threats implications for financial institutions deploying digital technology platforms and application: FMEA and FTOPSIS analysis,” PeerJ Computer Science, vol. 7, pp. 1–26, 2021, doi: 10.7717/PEERJ-CS.658.

“No Title,” in Applied Multivariate Statistical Analysis, Springer Berlin Heidelberg, pp. 93–146. doi: 10.1007/978-3-540-72244-1_4.

D. Tanir and F. Nuriyeva, “On selecting the Initial Cluster Centers in the K-Means Algorithm,” in 2017 IEEE 11th International Conference on Application of Information and Communication Technologies (AICT), IEEE, Sep. 2017, pp. 1–5. doi: 10.1109/icaict.2017.8687081.

C. Xiong, Z. Hua, K. Lv, and X. Li, “An Improved K-Means Text Clustering Algorithm by Optimizing Initial Cluster Centers,” in 2016 7th International Conference on Cloud Computing and Big Data (CCBD), IEEE, Nov. 2016, pp. 265–268. doi: 10.1109/ccbd.2016.059.

M.-S. Yang and K. P. Sinaga, “A Feature-Reduction Multi-View K-Means Clustering Algorithm,” IEEE Access, vol. 7, pp. 114472–114486, 2019, doi: 10.1109/access.2019.2934179.

Z. Sitorus, A. P. U. Siahaan, B. S. Ibrahim, A. O. Sari, and A. Ibezato, “Strategi Digital Marketing Dengan Metode SEO (Search Engine Optimization) untuk UMKM di Desa Klambir 5 Kebun,” 2022.

A. I. Zalukhu, I. Syahputra, Z. Sitorus, and others, “Penerapan Metode Certainty Factor Pada Sistem Pakar Diagnosa Penyakit Gigi Dan Mulut,” Bulletin of Information Technology (BIT), vol. 4, no. 4, pp. 544–553, 2023.

A. Astel, S. Tsakovski, P. Barbieri, and V. Simeonov, “Comparison of self-organizing maps classification approach with cluster and principal components analysis for large environmental data sets,” Water Res, vol. 41, no. 19, pp. 4566–4578, 2007.

M. K. Islam, M. S. Ali, M. S. Miah, M. M. Rahman, M. S. Alam, and M. A. Hossain, “Brain tumor detection in MR image using superpixels, Principal Component Analysis and template based K-Means Clustering algorithm,” Machine Learning with Applications, vol. 5, p. 100044, 2021.

L. Yin, L. Lv, D. Wang, Y. Qu, H. Chen, and W. Deng, “Spectral Clustering approach with K-nearest neighbor and weighted mahalanobis distance for data mining,” Electronics (Basel), vol. 12, no. 15, p. 3284, 2023.

K. Honda, A. Notsu, and H. Ichihashi, “PCA-guided K-Means with variable weighting and its application to document clustering,” in International Conference on Modeling Decisions for Artificial Intelligence, 2009, pp. 282–292.

S. M. Shaharudin, N. Ahmad, and F. Yusof, “Improved cluster partition in Principal Component Analysis guided clustering,” Int J Comput Appl, vol. 75, no. 11, 2013.

S. Surono and R. D. A. Putri, “Optimization of fuzzy c-means Clustering algorithm with combination of minkowski and chebyshev distance using Principal Component Analysis,” International journal of fuzzy systems, vol. 23, no. 1, pp. 139–144, 2021.

S. Mulyaningsih and J. Heikal, “K-Means Clustering Using Principal Component Analysis (PCA) Indonesia Multi-Finance Industry Performance Before and During Covid-19,” APMBA (Asia Pacific Management and Business Application), vol. 11, no. 2, pp. 131–142, 2022.

Downloads

Published

2025-12-31

How to Cite

Agusnady, A., Sitompul, O. S. S., & Tulus, T. (2025). Peningkatan Kinerja K-Means Clustering Berdasarkan Pembobotan Jarak Menggunakan Metode Principal Component Analysis. Jurnal Sistem Komputer Dan Informatika (JSON), 7(2), 711–718. https://doi.org/10.30865/json.v7i2.9394