Algoritma Jaccard Similarity untuk Deteksi Kemiripan Judul Disertasi dengan Pendekatan Variasi Stop Word Removal

 (*)Liga Mayola Mail (Universitas Putra Indonesia YPTK, Padang, Indonesia)
 M. Hafizh (Universitas Putra Indonesia YPTK, Padang, Indonesia)
 Deri Marse Putra (Universitas Putra Indonesia YPTK, Padang, Indonesia)

(*) Corresponding Author

Submitted: December 9, 2023; Published: January 28, 2024


Choosing an unique dissertation title is a challenge. The number of dissertation titles rises as the number of students increases. The title of the dissertation must differ between students. Anticipation that can be done is to adopt a similarity algorithm to detect similarities in dissertation titles. The similarity algorithm chosen is the Jaccard Similarity Algorithm. Jaccard algorithm can be used to detect document similarities. Analysis process begins with preprocessing text. The stages of preprocessing text are case folding, tokenizing, stop word removal and stemming. In this study, variations of stop word removal were tested and the accuracy results obtained were tested after being analyzed using Jaccard Similarity. Researchers call it Stop Word Removal Version One (SWR1) and Stop Word Removal Version Two (SWR2). In SWR1 only prepositions and conjunctions are deleted. Meanwhile SWR2; what was done was the deletion of words in SWR1 plus the deletion of words that were often used in the title but did not make a significant contribution to the meaning of the title. The aim of this approach is to test the accuracy produced by Jaccard against these two stop word removal approaches. The research results show that Jaccard accuracy with SWR2 has an accuracy of 97.8% and SWR1 accuracy is 57.7%. stop word removal , is a critical stage in determining similarity and has a significant influence on the results of the Jaccard Algorithm.


Jaccard Similarity; Text Preprocessing; Stop Word Removal ; Detection; Similarity

