Algoritma Jaccard Similarity untuk Deteksi Kemiripan Judul Disertasi dengan Pendekatan Variasi Stop Word Removal

 (*)Liga Mayola Mail (Universitas Putra Indonesia YPTK, Padang, Indonesia)
 M. Hafizh (Universitas Putra Indonesia YPTK, Padang, Indonesia)
 Deri Marse Putra (Universitas Putra Indonesia YPTK, Padang, Indonesia)

(*) Corresponding Author

Submitted: December 9, 2023; Published: January 28, 2024

Abstract

Choosing an unique dissertation title is a challenge. The number of dissertation titles rises as the number of students increases. The title of the dissertation must differ between students. Anticipation that can be done is to adopt a similarity algorithm to detect similarities in dissertation titles. The similarity algorithm chosen is the Jaccard Similarity Algorithm. Jaccard algorithm can be used to detect document similarities. Analysis process begins with preprocessing text. The stages of preprocessing text are case folding, tokenizing, stop word removal and stemming. In this study, variations of stop word removal were tested and the accuracy results obtained were tested after being analyzed using Jaccard Similarity. Researchers call it Stop Word Removal Version One (SWR1) and Stop Word Removal Version Two (SWR2). In SWR1 only prepositions and conjunctions are deleted. Meanwhile SWR2; what was done was the deletion of words in SWR1 plus the deletion of words that were often used in the title but did not make a significant contribution to the meaning of the title. The aim of this approach is to test the accuracy produced by Jaccard against these two stop word removal approaches. The research results show that Jaccard accuracy with SWR2 has an accuracy of 97.8% and SWR1 accuracy is 57.7%. stop word removal , is a critical stage in determining similarity and has a significant influence on the results of the Jaccard Algorithm.

Keywords


Jaccard Similarity; Text Preprocessing; Stop Word Removal ; Detection; Similarity

Full Text:

PDF


Article Metrics

Abstract view : 443 times
PDF - 244 times

References

W. Wiyarsih, “Analisis Trends Topik Penelitian Mahasiswa Fakultas MIPA UGM Periode 2016-2018,” UNILIB J. Perpust., vol. 12, no. 1, pp. 1–15, 2021, doi: 10.20885/unilib.vol12.iss1.art1.

A. N. Fadhullah, F. Fauziah, and W. Winarsih, “Aplikasi Deteksi Dini Plagiarism Penelitian Ilmiah Menggunakan Algoritma Consine Similarity Berbasis Web,” J. JTIK (Jurnal Teknol. Inf. dan Komunikasi), vol. 6, no. 3, pp. 325–334, 2022, doi: 10.35870/jtik.v6i3.427.

R. L. Andharsaputri, “Rancang Bangun Sistem Informasi Pengadaan Barang Dan Jasa Berbasis Dekstop,” J. Ilm. Teknol. Inf. Asia, vol. 15, no. 1, p. 1, 2021, doi: 10.32815/jitika.v15i1.529.

N. Prima Putra and S. Sularno, “Penerapan Algoritma Rabin-Karp Dengan Pendekatan Synonym Recognition Sebagai Antisipasi Plagiarisme Pada Penulisan Skripsi,” J. Teknol. Dan Sist. Inf. Bisnis, vol. 1, no. 2, pp. 48–58, 2019, doi: 10.47233/jteksis.v1i2.52.

M. E. Nahak, D. Nababan, and Y. O. . Rema, “Building A Web-Based Final Project Detection Information System with Incremental Method and Jaccard Similarity Algorithm,” J. Tek. Inform., vol. 16, no. 1, pp. 25–34, 2023, doi: 10.15408/jti.v16i1.29342.

Murien Nugraheni, “Perbandingan Jaccard Similarity Dengan Extended Jaccard Similarity Pada Penalaran Berbasis Kasus,” PINTER J. Pendidik. Tek. Inform. dan Komput., vol. 4, no. 2, pp. 49–52, 2020, doi: 10.21009/pinter.4.2.10.

M. Besta et al., “Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons,” in 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020, pp. 1122–1132. doi: 10.1109/IPDPS47924.2020.00118.

S. Puad and A. Susilo Yuda Irawan, “Analisis Sentimen Masyarakat Pada Twitter Terhadap Pemilihan Umum 2024 Menggunakan Algoritma Naïve Bayes,” J. Mhs. Tek. Inform., vol. 7, no. 3, 2023.

N. Nofiyani and W. Wulandari, “Implementasi Electronic Data Processing Untuk meningkatkan Efektifitas dan Efisiensi Pada Text Mining,” J. Media Inform. Budidarma, vol. 6, no. 3, p. 1621, 2022, doi: 10.30865/mib.v6i3.4332.

A. K. Pandey and T. J. Siddiqui, “Evaluating Effect of Stemming and Stop-word,” pp. 317–325, 2012.

R. Puspitasari, Y. Findawati, M. A. Rosid, P. S. Informatika, and U. M. Sidoarjo, “Sentiment Analysis of Post-Covid-19 Inflation Based on Twitter Using the K-Nearest Neighbor and Support Vector Machine Analisis Sentimen Terhadap Inflasi Pasca Covid-19 Berdasarkan Twitter Dengan Metode Klasifikasi K-Nearest Neighbor Dan,” vol. 4, no. 4, pp. 1–11, 2023.

W. G. S. Parwita, “Pengujian Akurasi Sistem Rekomendasi Berbasis Content-Based Filtering,” Inform. Mulawarman J. Ilm. Ilmu Komput., vol. 14, no. 1, p. 27, 2019, doi: 10.30872/jim.v14i1.1272.

I. M. S. Putra, Putu Jhonarendra, and Ni Kadek Dwi Rusjayanthi, “Deteksi Kesamaan Teks Jawaban pada Sistem Test Essay Online dengan Pendekatan Neural Network,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 6, pp. 1070–1082, 2021, doi: 10.29207/resti.v5i6.3544.

P. Widiandana, Imam Riadi, and Sunardi, “Implementasi Metode Jaccard pada Analisis Investigasi Cyberbullying WhatsApp Messenger Menggunakan Kerangka Kerja National Institute of Standards and Technology,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 6, 2020, doi: 10.29207/resti.v4i6.2635.

I. M. Aqimuddin, S. M. Pamungkas, C. Gunawan, and M. A. Yaqin, “Analisis Kemiripan Model Proses Bisnis PMBoK dan Scrum menggunakan Metode Jaccard Coefficient Similarity dan Semantic Similarity,” vol. 5, no. 2, pp. 53–64, 2023.

I. G. Suarnata, I. M. Sukarsa, and K. S. Wibawa, “Pencocokan Menu Berbasis Keywords pada Chatbot dengan Metode Jaccard,” J. Ilm. Teknol. dan Komput., vol. 3, no. 1, pp. 786–793, 2022.

S. Utomo, I. Much, I. Subroto, and A. Riansyah, “Deteksi plagiat tugas akhir dengan metode,” vol. 4, no. April, pp. 132–141, 2020.

I. K. P. Pinajeng, I. M. Sukarsa, and I. M. S. Putra, “Perbaikan Kata pada Sistem Chatbot dengan Metode Jaro Winkler,” JITTER J. Ilm. Teknol. dan Komput., vol. 1, no. 2, pp. 86–95, 2020, [Online]. Available: https://ojs.unud.ac.id/index.php/jitter/article/view/66062

A. Y. A. Nugraha and F. F. Abdulloh, “Optimasi Naive Bayes dan Cosine Similarity Menggunakan Particle Swarm Optimization Pada Klasifikasi Hoax Berbahasa Indonesia,” J. Media Inform. Budidarma, vol. 6, no. 3, p. 1444, 2022, doi: 10.30865/mib.v6i3.4170.

E. Suhailah and Hartatik, “Pembuatan Sistem Rekomendasi Pariwisata Yogyakarta Menggunakan Triangle Multiplaying Jaccard Creating a Yogyakarta Tourism Recommendation System Using Triangle Multiplaying Jaccard,” vol. 3, no. 2, pp. 115–126, 2023.

A. D. Hartanto, A. Syaputra, and Y. Pristyanto, “Best parameter selection of rabin-Karp algorithm in detecting document similarity,” 2019 Int. Conf. Inf. Commun. Technol. ICOIACT 2019, no. February 2020, pp. 457–461, 2019, doi: 10.1109/ICOIACT46704.2019.8938458.

W. Desena and A. Solichin, “Pencarian Abstrak Tugas Akhir Mahasiswa Berdasarkan Tingkat Kemiripan Menggunakan Algoritma Winnowing dan Jaccard Similarity pada Universitas Budi Luhur,” Inform. J. Ilmu Komput., vol. 17, no. 2, p. 112, 2021, doi: 10.52958/iftk.v17i2.3628.

B. Bakiyev, “Method for Determining the Similarity of Text Documents for the Kazakh language, Taking Into Account Synonyms: Extension to TF-IDF,” SIST 2022 - 2022 Int. Conf. Smart Inf. Syst. Technol. Proc., pp. 28–30, 2022, doi: 10.1109/SIST54437.2022.9945747.

T. wahyuningsih, “Text Mining an Automatic Short Answer Grading (ASAG), Comparison of Three Methods of Cosine Similarity, Jaccard Similarity and Dice’s Coefficient,” J. Appl. Data Sci., vol. 2, no. 2, pp. 45–54, 2021, doi: 10.47738/jads.v2i2.31.

J. Soni, N. Prabakar, and H. Upadhyay, “Behavioral analysis of system call sequences using LSTM seq-seq, cosine similarity and jaccard similarity for real-time anomaly detection,” Proc. - 6th Annu. Conf. Comput. Sci. Comput. Intell. CSCI 2019, pp. 214–219, 2019, doi: 10.1109/CSCI49370.2019.00043.

A. Rana and K. Deeba, “Online book recommendation system using collaborative filtering (with jaccard similarity),” J. Phys. Conf. Ser., vol. 1362, no. 1, 2019, doi: 10.1088/1742-6596/1362/1/012130.

C. Agustina and E. Rahmawati, “Klasterisasi Objek Wisata Menggunakan Jaccard Similarity Coefficient Berdasarkan Attraction, Accessability, Amenity dan Ancilarry Service,” EVOLUSI J. Sains dan Manaj., vol. 11, no. 1, 2023, doi: 10.31294/evolusi.v11i1.15114.

S. Proboningrum and Acihmah Sidauruk, “Sistem Pendukung Keputusan Pemilihan Supplier Kain Dengan Metode Moora,” JSiI (Jurnal Sist. Informasi), vol. 8, no. 1, pp. 43–48, 2021, doi: 10.30656/jsii.v8i1.3073.

S. Maharani, H. Ridwanto, H. R. Hatta, D. M. Khairina, and M. R. Ibrahim, “Comparison of topsis and maut methods for recipient determination home surgery,” IAES Int. J. Artif. Intell., vol. 10, no. 4, pp. 930–937, 2021, doi: 10.11591/IJAI.V10.I4.PP930-937.

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 JURNAL MEDIA INFORMATIKA BUDIDARMA

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.



JURNAL MEDIA INFORMATIKA BUDIDARMA
STMIK Budi Darma
Secretariat: Sisingamangaraja No. 338 Telp 061-7875998
Email: mib.stmikbd@gmail.com

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.