Pengembangan Sistem Deteksi Plagiarisme Dokumen Jurnal Berbasis Bidirectional Encoder Representations from Transformers Dan Cosine Similarity

Authors

  • Cahya Yoga Ariyanto Universitas Teknologi Yogyakarta, Yogyakarta
  • Adam Sekti Aji Universitas Teknologi Yogyakarta, Yogyakarta

DOI:

https://doi.org/10.30865/jurikom.v12i6.9325

Keywords:

Plagiarisme, Cosine Similarity, BERT, TF-IDF, NLP

Abstract

The development of digital technology has had a significant impact across various fields, including education and the management of scientific documents. The ease of access to online journals has introduced a new challenge—an increase in the potential for plagiarism. To address this issue, an automated system capable of detecting document similarity quickly and accurately is required. This study aims to develop a plagiarism detection system based on Cosine Similarity and Bidirectional Encoder Representations from Transformers (BERT). The research stages include text preprocessing, word weighting using Term Frequency–Inverse Document Frequency (TF-IDF), Cosine Similarity computation, BERT model training, and model performance evaluation. The results show that integrating BERT with TF-IDF significantly improves performance compared to using BERT alone. Based on the experiments, the BERT model with TF-IDF achieved the highest accuracy of 0.9621 in a 10:90 data split scenario, with a precision of 0.8141, recall of 0.7302, and F1-score of 0.8022. Meanwhile, the BERT model without TF-IDF only achieved an accuracy of 0.8529. The application of Cosine Similarity with a threshold value of 0.6 also proved effective in identifying plagiarized and non-plagiarized documents. These findings demonstrate that combining BERT and TF-IDF enhances the accuracy of plagiarism detection systems by simultaneously capturing semantic context and word weighting.

References

[1] F. Z. Hasibuan and J. Simangunsong, “ANALISA METODE COSINE SIMILARITY DALAM MENDETEKSI PLAGIARISME PADA ARTIKEL ILMIAH,” vol. 3, p. 2023.

[2] A. Tri Putra Darti Akhsa, M. Ikhwan Burhan, and A. Munandar, “Integrasi OCR dan TF-IDF untuk Metadata Otomatis pada Pencarian Dokumen Digital”.

[3] F. Z. Hasibuan and J. Simangunsong, “ANALISA METODE COSINE SIMILARITY DALAM MENDETEKSI PLAGIARISME PADA ARTIKEL ILMIAH,” vol. 3, p. 2023.

[4] “Musthofa Galih Pradana”, doi: 10.21927/ijubi.v7i2.5170.

[5] R. Arief Permana, D. Priharsari, and A. R. Perdanakusuma, “Analisis Penggunaan Software Turnitin sebagai Alat Pendeteksi Plagiarisme,” 2022. [Online]. Available: http://j-ptiik.ub.ac.id

[6] Maulidya Prastita Syah, Ajeng Puspa Wardani, Mohammad Idhom, and Trimono, “Perbandingan Representasi Teks Tf-Idf Dan Bert Terhadap Akurasi Cosine Similarity Dalam Penilaian Otomatis Jawaban Berbasis Teks,” Data Sciences Indonesia (DSI), vol. 5, no. 1, pp. 47–59, Jul. 2025, doi: 10.47709/dsi.v5i1.6021.

[7] D. Darmanto, N. I. Pradasari, and E. Wahyudi, “Sistem Deteksi Plagiarisme Tugas Akhir Mahasiswa Berbasis Natural Language Processing Menggunakan Algoritma Jaro-Winkler dan TF-IDF,” Smart Comp: Jurnalnya Orang Pintar Komputer, vol. 13, no. 1, Jan. 2024, doi: 10.30591/smartcomp.v13i1.6375.

[8] E. Silalahi, D. Silalahi, D. Plagiarisme Sebagai Peningkatan, M. Irani Tarigan, and R. Veronica Sinaga, “DETEKSI PLAGIARISME SEBAGAI PENINGKATAN INTEGRITAS AKADEMIK”.

[9] L. Mayola, M. Hafizh, and D. M. Putra, “Perancangan Aplikasi Similarity Deteksi Kemiripan Judul Disertasi Berbasis Web,” Jurnal Teknologi Dan Sistem Informasi Bisnis, vol. 6, no. 2, pp. 452–257, Apr. 2024, doi: 10.47233/jteksis.v6i2.1164.

[10] S. Purwaningrum, A. Susanto, and A. Kristiningsih, “Pengaruh Synonym Recognition dalam Deteksi Kemiripan Teks Menggunakan Winnowing dan Cosine Similarity”.

[11] M. Dzikry Afandi, A. Homaidi, A. Ghofur, and A. Zubairi, “Penerapan Information Retrieval dalam Sistem Analisis Kemiripan Proposal Skripsi menggunakan Cosine Similarity,” JURNAL SWABUMI, vol. 12, no. 1, p. 2023, 2024.

[12] V. Sihombing and F. Anggriana, “Deteksi Plagiarisme Tugas Mahasiswa Menggunakan Cosine Similarity dan NLP,” 2024.

[13] A. Fardhina, R. M. Siregar, M. R. W. Br Sibarani, I. C. Br Ginting, and A. Pratama, “Sistem Deteksi Berita Hoaks berbasis Algoritma Natural Language Processing (NLP) menggunakan BERT,” Jurnal Manajemen Informatika, Sistem Informasi dan Teknologi Komputer (JUMISTIK), vol. 4, no. 1, pp. 450–461, Jun. 2025, doi: 10.70247/jumistik.v4i1.156.

[14] A. Dzaky, H. Musta’in, A. Sanjaya, and A. B. Setiawan, “Prosiding SEMNAS INOTEK (Seminar Nasional Inovasi Teknologi) 2025 154 Penerapan Regular Expression dan Cosine Similarity pada Uji Kemiripan Kalimat Bahasa Indonesia 1*,” Online.

[15] I. Abdurrohim and A. Rahman, “PENERAPAN NATURAL LANGUAGE PROCESSING UNTUK ANALISIS SENTIMEN TERHADAP KEBIJAKAN PEMERINTAH”.

[16] M. Hafizh Mahendra, D. Triantoro Murdiansyah, and K. Muslim Lhaksmana, “Dike : Jurnal Ilmu Multidisiplin Analisis Sentimen Tweet COVID-19 Menggunakan Metode K-Nearest Neighbors dengan Ekstraksi Fitur TF-IDF dan CountVectorizer,” 2023.

[17] N. O. Idris and F. Pontoiyo, “Sistem Rekomendasi Produk Makeup Berbasis Content-Based Filtering dengan TF-IDF dan Cosine Similarity,” KETIK : Jurnal Informatika, vol. 2, no. 06, pp. 24–32, Jul. 2025, doi: 10.70404/ketik.v2i06.311.

[18] A. Rapp, L. Di Caro, F. Meziane, and V. Sugumaran, Natural Language Processing and Information Systems: 29th International Conference on Applications of Natural Language to Information Systems, NLDB 2024, Turin, Italy, June 25–27, 2024, Proceedings, Part I. in Lecture Notes in Computer Science. Springer Nature Switzerland, 2024. [Online]. Available: https://books.google.co.id/books?id=mgQjEQAAQBAJ

[19] J. P. Pamput, A. R. Muthmainnah, D. F. Surianto, and N. Fadilah, “Perbandingan Cosine Similarity dan Weighted Jaccard Similarity dalam Pengembangan Mesin Pencari Perpustakaan Digital,” Jurnal Informatika: Jurnal Pengembangan IT, vol. 10, no. 4, pp. 907–919, Sep. 2025, doi: 10.30591/jpit.v10i4.8773.

[20] O. Sanseviero, P. Cuenca, A. Passos, and J. Whitaker, Hands-On Generative AI with Transformers and Diffusion Models. O’Reilly Media, 2024. [Online]. Available: https://books.google.co.id/books?id=0CczEQAAQBAJ

Additional Files

Published

2025-12-31

How to Cite

Ariyanto, C. Y., & Aji, A. S. (2025). Pengembangan Sistem Deteksi Plagiarisme Dokumen Jurnal Berbasis Bidirectional Encoder Representations from Transformers Dan Cosine Similarity. JURNAL RISET KOMPUTER (JURIKOM), 12(6), 942–948. https://doi.org/10.30865/jurikom.v12i6.9325