Analisis Metode DBSCAN (Density-Based Spatial Clustering of Application with Noise) dalam Mendeteksi Data Outlier

Dedy Armiady

doi:10.30865/jurikom.v9i6.5080

Authors

Dedy Armiady Universitas Almuslim, Bireuen

DOI:

https://doi.org/10.30865/jurikom.v9i6.5080

Keywords:

Data Outlier, DBSCAN, Data Mining, Epsilon, Clustering, MinPts

Abstract

Data outlier is data that is different from a group of data in a dataset. Data outlier will have an impact on the refraction of data analysis results, if not handled properly. Various approaches can be taken to detect data outlier, one of which is through the clustering method (grouping data). DBSCAN (Density-Based Spatial Clustering of Application with Noise) is a clustering method that is able to find data outlier in a data set. DBSCAN works by determining clusters based on data density, using the parameters epsilon (range) and MinPts (minimum points to form a cluster). This study aims to test several DBSCAN models that have different epsilon and MinPts parameters. The model used consists of 3 models, with details: Model 1 (eps=0,2, MinPts=5), Model 2 (eps=0,3, MinPts=5) and Model 3 (eps=0,4, MinPts =5). The dataset used is a dataset generated through the paint data feature on the Orange Data Mining tool, with 2 variables (x and y), with a total of 1051 data lines of records. The results obtained are that all the tested models found that there is 1 data point that is considered an outlier, namely the data is worth x = 0.370007 and y = 0.410475. In addition, from this research, it can also be concluded that the epsilon value affects the number of clusters formed. The higher the epsilo value, the smaller the number of clusters that may be formed

References

L. Sun, K. Zhou, X. Zhang, and S. Yang, â€œOutlier Data Treatment Methods Toward Smart Grid Applications,â€ IEEE Access, vol. 6, 2018, doi: 10.1109/ACCESS.2018.2852759.

O. Alghushairy, R. Alsini, T. Soule, and X. Ma, â€œA review of local outlier factor algorithms for outlier detection in big data streams,â€ Big Data and Cognitive Computing, vol. 5, no. 1. 2021. doi: 10.3390/bdcc5010001.

L. Lyutikova and E. Shmatova, â€œUsing a logical derivative to analyze data outlier,â€ in Procedia Computer Science, 2020, vol. 169. doi: 10.1016/j.procs.2020.02.187.

H. Ghallab, H. Fahmy, and M. Nasr, â€œDetection outliers on internet of things using big data technology,â€ Egyptian Informatics Journal, vol. 21, no. 3, 2020, doi: 10.1016/j.eij.2019.12.001.

R. Zhu et al., â€œKNN-Based Approximate Outlier Detection Algorithm over IoT Streaming Data,â€ IEEE Access, vol. 8, 2020, doi: 10.1109/ACCESS.2020.2977114.

T. Schamberger, F. Schuberth, J. Henseler, and T. K. Dijkstra, â€œRobust partial least squares path modeling,â€ Behaviormetrika, vol. 47, no. 1, 2020, doi: 10.1007/s41237-019-00088-2.

J. Frost, â€œGuidelines for Removing and Handling Outliers in Data,â€ Statistics by Jim, 2018.

T. v. Pollet and L. van der Meij, â€œTo Remove or not to Remove: the Impact of Outlier Handling on Significance Testing in Testosterone Data,â€ Adaptive Human Behavior and Physiology, vol. 3, no. 1, 2017, doi: 10.1007/s40750-016-0050-z.

H. Torkey, E. Ibrahim, E. E. D. Hemdan, A. El-Sayed, and M. A. Shouman, â€œDiabetes classification application with efficient missing and outliers data handling algorithms,â€ Complex and Intelligent Systems, vol. 8, no. 1, 2022, doi: 10.1007/s40747-021-00349-2.

J. Zhang, â€œAdvancements of Outlier Detection: A Survey,â€ ICST Transactions on Scalable Information Systems, vol. 13, no. 1, 2013, doi: 10.4108/trans.sis.2013.01-03.e2.

R. J. G. B. Campello, D. Moulavi, A. Zimek, and J. Sander, â€œHierarchical density estimates for data clustering, visualization, and outlier detection,â€ ACM Trans Knowl Discov Data, vol. 10, no. 1, 2015, doi: 10.1145/2733381.

D. Phamtoan, K. Nguyenhuu, and T. Vovan, â€œFuzzy clustering algorithm for outlier-interval data based on the robust exponent distance,â€ Applied Intelligence, vol. 52, no. 6, 2022, doi: 10.1007/s10489-021-02773-w.

H. Tong and C. Tortora, â€œModel-based clustering and outlier detection with missing data,â€ Adv Data Anal Classif, vol. 16, no. 1, 2022, doi: 10.1007/s11634-021-00476-1.

M. J. Bah, H. Wang, L. H. Zhao, J. Zhang, and J. Xiao, â€œEMM-CLODS: An Effective Microcluster and Minimal Pruning CLustering-Based Technique for Detecting Outliers in Data Streams,â€ Complexity, vol. 2021, 2021, doi: 10.1155/2021/9178461.

M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, â€œA Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,â€ in Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, 1996.

M. Ramadhani and D. Fitrianah, â€œImplementation of data mining analysis to determine the tuna fishing zone using DBSCAN algorithm,â€ Int J Mach Learn Comput, vol. 9, no. 5, 2019, doi: 10.18178/ijmlc.2019.9.5.862.

H. T. Lee, J. S. Lee, H. Yang, and I. S. Cho, â€œAn AIS data-driven approach to analyze the pattern of ship trajectories in ports using the DBSCAN algorithm,â€ Applied Sciences (Switzerland), vol. 11, no. 2, 2021, doi: 10.3390/app11020799.

F. Huang et al., â€œResearch on the parallelization of the DBSCAN clustering algorithm for spatial data mining based on the Spark platform,â€ Remote Sens (Basel), vol. 9, no. 12, 2017, doi: 10.3390/rs9121301.

M. M. Putri, C. Dewi, E. Permata Siam, G. Asri Wijayanti, N. Aulia, and R. Nooraeni, â€œComparison of DBSCAN and K-Means Clustering for Grouping the Village Status in Central Java 2020,â€ Jurnal Matematika, Statistika & Komputasi, vol. 17, no. 3, 2021.

A. Kristianto, â€œAnalisa Performa K-Means dan DBSCAN dalam Clustering Minat Penggunaan Transportasi Umum,â€ Elkom : Jurnal Elektronika dan Komputer, vol. 14, no. 2, 2021, doi: 10.51903/elkom.v14i2.551.

M. Verma, M. Srivastava, N. Chack, A. K. Diswar, and N. Gupta, â€œA Comparative Study of Various Clustering Algorithms in Data Mining,â€ International Journal of Engineering Research and Applications www.ijera.com, vol. 2, no. 3, 2012.

T. H. F. Khan, N. N. Alleema, N. Yadav, S. Mishra, and A. Shahi, â€œText document clustering using K-means and dbscan by using machine learning,â€ Int J Eng Adv Technol, vol. 9, no. 1, 2019, doi: 10.35940/ijeat.A2040.109119.

Analisis Metode DBSCAN (Density-Based Spatial Clustering of Application with Noise) dalam Mendeteksi Data Outlier

Authors

DOI:

Keywords:

Abstract

References

Additional Files

Published

How to Cite

Issue

Section

menujuribaru

template

sitasigs

member