Analisis Metode DBSCAN (Density-Based Spatial Clustering of Application with Noise) dalam Mendeteksi Data Outlier

 (*)Dedy Armiady Mail (Universitas Almuslim, Bireuen, Indonesia)

(*) Corresponding Author

Abstract

Data outlier is data that is different from a group of data in a dataset. Data outlier will have an impact on the refraction of data analysis results, if not handled properly. Various approaches can be taken to detect data outlier, one of which is through the clustering method (grouping data). DBSCAN (Density-Based Spatial Clustering of Application with Noise) is a clustering method that is able to find data outlier in a data set. DBSCAN works by determining clusters based on data density, using the parameters epsilon (range) and MinPts (minimum points to form a cluster). This study aims to test several DBSCAN models that have different epsilon and MinPts parameters. The model used consists of 3 models, with details: Model 1 (eps=0,2, MinPts=5), Model 2 (eps=0,3, MinPts=5) and Model 3 (eps=0,4, MinPts =5). The dataset used is a dataset generated through the paint data feature on the Orange Data Mining tool, with 2 variables (x and y), with a total of 1051 data lines of records. The results obtained are that all the tested models found that there is 1 data point that is considered an outlier, namely the data is worth x = 0.370007 and y = 0.410475. In addition, from this research, it can also be concluded that the epsilon value affects the number of clusters formed. The higher the epsilo value, the smaller the number of clusters that may be formed

Keywords


Data Outlier; DBSCAN; Data Mining; Epsilon; Clustering; MinPts

Full Text:

PDF


Article Metrics

Abstract view : 668 times
PDF - 666 times

References

L. Sun, K. Zhou, X. Zhang, and S. Yang, “Outlier Data Treatment Methods Toward Smart Grid Applications,” IEEE Access, vol. 6, 2018, doi: 10.1109/ACCESS.2018.2852759.

O. Alghushairy, R. Alsini, T. Soule, and X. Ma, “A review of local outlier factor algorithms for outlier detection in big data streams,” Big Data and Cognitive Computing, vol. 5, no. 1. 2021. doi: 10.3390/bdcc5010001.

L. Lyutikova and E. Shmatova, “Using a logical derivative to analyze data outlier,” in Procedia Computer Science, 2020, vol. 169. doi: 10.1016/j.procs.2020.02.187.

H. Ghallab, H. Fahmy, and M. Nasr, “Detection outliers on internet of things using big data technology,” Egyptian Informatics Journal, vol. 21, no. 3, 2020, doi: 10.1016/j.eij.2019.12.001.

R. Zhu et al., “KNN-Based Approximate Outlier Detection Algorithm over IoT Streaming Data,” IEEE Access, vol. 8, 2020, doi: 10.1109/ACCESS.2020.2977114.

T. Schamberger, F. Schuberth, J. Henseler, and T. K. Dijkstra, “Robust partial least squares path modeling,” Behaviormetrika, vol. 47, no. 1, 2020, doi: 10.1007/s41237-019-00088-2.

J. Frost, “Guidelines for Removing and Handling Outliers in Data,” Statistics by Jim, 2018.

T. v. Pollet and L. van der Meij, “To Remove or not to Remove: the Impact of Outlier Handling on Significance Testing in Testosterone Data,” Adaptive Human Behavior and Physiology, vol. 3, no. 1, 2017, doi: 10.1007/s40750-016-0050-z.

H. Torkey, E. Ibrahim, E. E. D. Hemdan, A. El-Sayed, and M. A. Shouman, “Diabetes classification application with efficient missing and outliers data handling algorithms,” Complex and Intelligent Systems, vol. 8, no. 1, 2022, doi: 10.1007/s40747-021-00349-2.

J. Zhang, “Advancements of Outlier Detection: A Survey,” ICST Transactions on Scalable Information Systems, vol. 13, no. 1, 2013, doi: 10.4108/trans.sis.2013.01-03.e2.

R. J. G. B. Campello, D. Moulavi, A. Zimek, and J. Sander, “Hierarchical density estimates for data clustering, visualization, and outlier detection,” ACM Trans Knowl Discov Data, vol. 10, no. 1, 2015, doi: 10.1145/2733381.

D. Phamtoan, K. Nguyenhuu, and T. Vovan, “Fuzzy clustering algorithm for outlier-interval data based on the robust exponent distance,” Applied Intelligence, vol. 52, no. 6, 2022, doi: 10.1007/s10489-021-02773-w.

H. Tong and C. Tortora, “Model-based clustering and outlier detection with missing data,” Adv Data Anal Classif, vol. 16, no. 1, 2022, doi: 10.1007/s11634-021-00476-1.

M. J. Bah, H. Wang, L. H. Zhao, J. Zhang, and J. Xiao, “EMM-CLODS: An Effective Microcluster and Minimal Pruning CLustering-Based Technique for Detecting Outliers in Data Streams,” Complexity, vol. 2021, 2021, doi: 10.1155/2021/9178461.

M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” in Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, 1996.

M. Ramadhani and D. Fitrianah, “Implementation of data mining analysis to determine the tuna fishing zone using DBSCAN algorithm,” Int J Mach Learn Comput, vol. 9, no. 5, 2019, doi: 10.18178/ijmlc.2019.9.5.862.

H. T. Lee, J. S. Lee, H. Yang, and I. S. Cho, “An AIS data-driven approach to analyze the pattern of ship trajectories in ports using the DBSCAN algorithm,” Applied Sciences (Switzerland), vol. 11, no. 2, 2021, doi: 10.3390/app11020799.

F. Huang et al., “Research on the parallelization of the DBSCAN clustering algorithm for spatial data mining based on the Spark platform,” Remote Sens (Basel), vol. 9, no. 12, 2017, doi: 10.3390/rs9121301.

M. M. Putri, C. Dewi, E. Permata Siam, G. Asri Wijayanti, N. Aulia, and R. Nooraeni, “Comparison of DBSCAN and K-Means Clustering for Grouping the Village Status in Central Java 2020,” Jurnal Matematika, Statistika & Komputasi, vol. 17, no. 3, 2021.

A. Kristianto, “Analisa Performa K-Means dan DBSCAN dalam Clustering Minat Penggunaan Transportasi Umum,” Elkom : Jurnal Elektronika dan Komputer, vol. 14, no. 2, 2021, doi: 10.51903/elkom.v14i2.551.

M. Verma, M. Srivastava, N. Chack, A. K. Diswar, and N. Gupta, “A Comparative Study of Various Clustering Algorithms in Data Mining,” International Journal of Engineering Research and Applications www.ijera.com, vol. 2, no. 3, 2012.

T. H. F. Khan, N. N. Alleema, N. Yadav, S. Mishra, and A. Shahi, “Text document clustering using K-means and dbscan by using machine learning,” Int J Eng Adv Technol, vol. 9, no. 1, 2019, doi: 10.35940/ijeat.A2040.109119.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Analisis Metode DBSCAN (Density-Based Spatial Clustering of Application with Noise) dalam Mendeteksi Data Outlier

Refbacks

  • There are currently no refbacks.


Copyright (c) 2022 Dedy Armiady

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

JURIKOM (Jurnal Riset Komputer)
Publish by Universitas Budi Darma (before STMIK BUDI DARMA (P3M))
Email: jurikom.stmikbd@gmail.com

Creative Commons License
 This work is licensed under a Creative Commons Attribution 4.0 International.