Penerapan Algoritma Winnowing dan Word-Level Trigrams Untuk Mengidentifikasi Kesamaan Kata
DOI:
https://doi.org/10.30865/jurikom.v9i2.4060Keywords:
Winnowing, Rolling-Hash, Word-Level, Trigrams, SimilarityAbstract
Identifying the same words in two or more texts is the first step in the process of detecting plagiarism. Plagiarism detection software are commercially available but relatively expensive. Although some software is offered for free, the features provided are very limited. Therefore, a word similarity detection system is needed to be used as an alternative for users that can be freely accessed. The application of the pattern matching method is one of the solutions that can be used to find the similarity of words between documents. There are several algorithms that can be used as a method to find the similarity of words in the text, including the Winnowing algorithm which is known to have good performance in detecting similarity of words. Winnowing is a hashing-approach based algorithm that applies hash-function and window formation to obtain fingerprints during pattern matching. Based on these fingerprints, the word similarity level can be calculated. Previous studies have only calculated the level of similarity of words based on the character (character-level), while the calculation of the level of similarity based on words (word-level) is still limited. This research was carried out with the aim of measuring the level of similarity of words using the Winnowing algorithm and word-level trigrams. The results showed that the Winnowing algorithm which was applied using word-level trigrams could detect similarities in the text of 76.84%, 52.29%, 37.40%, and 19.29%, respectively. From the results of the study, it can be concluded that the pattern matching method with the Winnowing algorithm and word-level trigrams can be used to measure the level of similarity of the textReferences
S. Hakak, A. Kamsin, P. Shivakumara, G. A. Gilkar, W. Z. Khan, and M. Imran, “Exact String Matching Algorithms: Survey, Issues, and Future Research Directions,†IEEE Access, pp. 2169–3536, 2018.
O. Hourrane and E. H. Benlahmar, “Survey of Plagiarism Detection Approaches and Big data Techniques related to Plagiarism Candidate Retrieval,†in Proceedings of the 2nd international Conference on Big Data, Cloud and Applications (BDCA’17), Mar. 2017, pp. 1–6.
N. Awale, M. Pandey, A. Dulal, and Timsina, “Plagiarism Detection in Programming Assignments using Machine Learning,†Journal of Artificial Intelligence and Capsule Networks, vol. 2, no. 3, pp. 177–184, Jul. 2020.
S. D. Schleimer, “Winnowing: Local Algorithms for Document Fingerprinting,†in Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, Jun. 2003, pp. 76–85.
A. Yudhana, Sunardi, and I. A. Mukaromah, “Implementation of Winnowing Algorithm with Dictionary English-Indonesia Technique to Detect Plagiarism,†IJACSA) International Journal of Advanced Computer Science and Applications, vol. 9, no. 5, pp. 183–189, 2018, [Online]. Available: www.ijacsa.thesai.org
R. Sutoyo et al., “Detecting documents plagiarism using winnowing algorithm and k-gram method,†in 2017 IEEE International Conference on Cybernetics and Computational Intelligence, CyberneticsCOM 2017 - Proceedings, Mar. 2018, vol. 2017-November, pp. 67–72. doi: 10.1109/CYBERNETICSCOM.2017.8311686.
S. Sunardi, A. Yudhana, and I. A. Mukaromah, “Indonesia Words Detection Using Fingerprint Winnowing Algorithm,†Jurnal Informatika, vol. 13, no. 1, p. 7, Jan. 2019, doi: 10.26555/jifo.v13i1.a8452.
Y. Nurdiansyah, F. Nur Muharrom, and F. Firdaus, “Implementation of Winnowing Algorithm Based K-Gram to Identify Plagiarism on File Text-Based Document,†in MATEC Web of Conferences, Apr. 2018, vol. 164. doi: 10.1051/matecconf/201816401048.
X. Duan, M. Wang, and J. Mu, “A Plagiarism Detection Algorithm based on Extended Winnowing,†in MATEC Web of Conferences, Oct. 2017, vol. 128. doi: 10.1051/matecconf/201712802019.
N. Ulinnuha, M. Thohir, D. C. R. Novitasari, A. H. Asyhar, and A. Z. Arifin, “Implementation of Winnowing Algorithm for Document Plagiarism Detection,†in Proceeding of EECSI , 2018, pp. 631–636.
Sugiono, Herwin, Hamdani, and Erlin, “Aplikasi Pendeteksi Tingkat Kesamaan Dokumen Teks- Rabin Ksrp vs Winnowing,†Jurnal Teknologi Informasi & Komunikasi Digital Zone, vol. 9, no. 1, pp. 82–93, 2018.
L. Sibarani, Magdalena, and A. Dharma, “Analisa Perbandingan Sistem Pendeteksian Kemiripan Judul Skripsi Menggunakan Algoritma Winnowing Dan Algoritma Rabin Karp,†Riset dan E-Jurnal Manajemen Informatika Komputer, vol. 4, pp. 69–82, Oct. 2019.
N. Alamsyah, “Perbandingan Algoritma Winnowing dengan Algoritma Rabin Karp untuk Mendeteksi Plagiarisme pada Kemiripan Teks Judul Skripsi,†Technologia, vol. 8, no. 3, pp. 124–134, 2017.
I. Widaningrum, D. Mustikasari, R. Arifin, and H. A. Pratiwi, “Evaluation of the accuracy of winnowing, rabin karp and knuth morris pratt algorithms in plagiarism detection applications,†in Journal of Physics: Conference Series, May 2020, vol. 1517, no. 1. doi: 10.1088/1742-6596/1517/1/012093.
M. Schonlau and N. Guenther, “Text mining using n-grams variables,†The Stata Journal., vol. 17, no. 4, pp. 866–881, Dec. 2017.
D. Wright, “Using word n-grams to identify authors and idiolects A corpus approach to a forensic linguistic problem,†International Journal of Corpus Linguistics, vol. 22, no. 2, pp. 212–241, Sep. 2017, doi: https://doi.org/10.1075/ijcl.22.2.03wri.
J. Alcañiz and J. Andrés, “Profiling Hate Spreaders using word N-grams Notebook for PAN at CLEF 2021,†Sep. 2021. [Online]. Available: http://ceur-ws.org
X. Wang and L. Liu, “Image Encryption Based on Hash Table Scrambling and DNA Substitution,†IEEE Access, vol. 8, pp. 68533–68547, 2020, doi: 10.1109/ACCESS.2020.2986831.
E. Karimov, “Hash Table. In: Data Structures and Algorithms in Swift,†Apress, Berkeley, CA, pp. 55–60, Mar. 2020, doi: https://doi.org/10.1007/978-1-4842-5769-2_7.
M. Corman, “Why Should the Length of Your Hash Table Be a Prime Number?,†Aug. 08, 2020. https://medium.com/swlh/why-should-the-length-of-your-hash-table-be-a-prime-number-760ec65a75d1 (accessed Apr. 14, 2022).
Stable Sort. Rolling Hash Function Tutorial Used by Rabin-Karp String Searching Algorithm. (Dec. 20, 2019). Accessed: Jan. 8, 2021. [Online Video]. Available: https://youtu.be/BfUejqd07yo
						


