Pengaruh Oversampling dan Cross Validation Pada Model Machine Learning Untuk Sentimen Analisis Kebijakan Luaran Kelulusan Mahasiswa
DOI:
https://doi.org/10.30865/mib.v8i1.7012Keywords:
Graduation Standards, Sentiment Analysis, Naïve Bayes Classifier, K-Nearest Neighbor, Lexicon-based, YouTube, SMOTEAbstract
The Minister of Education, Culture, Research and Technology issued a new policy on graduation standards for undergraduate and postgraduate students. This policy was delivered on August 29, 2023, on live streaming YouTube Kemendikbudristek at the Merdeka Belajar seminar episode 26: Transformation of National Standards and Higher Education Accreditation. The policy has caused various kinds of positive and negative responses in the community. Based on this problem, this research analyzes the sentiment of how the attitude and response of the community regarding this matter, so that it can be useful for the community in the future. This research uses two algorithms Naïve Bayes Classifier (NBC) and K-Nearest Neighbor (KNN) with data collection done through YouTube video comments getting a total dataset of 1085 data. After that, enter the data pre-processing which is then labeled using the Lexicon-based method with the stemming Sastrawi method. Datasets are grouped into positive sentiment and negative sentiment where the labeling results show unbalanced label data. Then the oversampling method Synthetic Minority Over-sampling Technique (SMOTE) is performed so that the data can be balanced and produce good accuracy. The test results after the SMOTE technique show that the NBC algorithm has the highest accuracy compared to KNN. The accuracy results are 74%, precision 74.6%, recall 74% and f1-score 73.9%. While KNN produces an accuracy of 50.2%, precision of 75.2%, recall of 50.2%, and f1-score of 34.5%.
References
H. D. Astuti, P. Meilina, N. Amri, and M. Hasbi, “Aplikasi Pengelompokkan Abstrak Skripsi Teknik Informatika,” Pros. Semnastek, no. November 2022, pp. 1–10, 2022, [Online]. Available: https://jurnal.umj.ac.id/index.php/semnastek/article/view/14701.
R. N. Sari, “Permendikbudristek No. 53 Tahun 2023 Tentang Penjaminan Mutu Pendidikan Tinggi,” kemendikbud.go.id, 2023. https://lldikti13.kemdikbud.go.id/2023/08/29/peraturan-terbaru-mengenai-penjaminan-mutu-pendidikan-tinggi/ (accessed Nov. 02, 2023).
R. L. Pratama, “Nadiem Umumkan Aturan Baru, Mahasiswa S1 Kini Tidak Wajib Buat Skripsi,” kompas.tv, 2023. https://www.kompas.tv/pendidikan/438914/nadiem-umumkan-aturan-baru-mahasiswa-s1-kini-tidak-wajib-buat-skripsi?page=all (accessed Oct. 30, 2022).
B. M. Akbar, A. T. Akbar, and R. Husaini, “Analysis of Sentiments and Emotions about Sinovac Vaccine Using Naive Bayes,” Telematika, vol. 19, no. 2, p. 185, 2022, doi: 10.31315/telematika.v19i2.7601.
J. Ortiz-Bejar, E. S. Tellez, M. Graff, D. Moctezuma, and S. Miranda-Jimenez, “Improving k Nearest Neighbors and Naïve Bayes Classifiers through Space Transformations and Model Selection,” IEEE Access, vol. 8, pp. 221669–221688, 2020, doi: 10.1109/ACCESS.2020.3042453.
D. A. M. Reza, A. M. Siregar, and Rahmat, “Penerapan Algoritma K-Nearest Neighbord Untuk Prediksi Kematian Akibat Penyakit Gagal Jantung,” Sci. Student J. Information, Technol. Sci. , vol. III, no. 1, pp. 105–112, 2022.
H. Setiawan and I. Zufria, “Analisis Sentimen Pembatalan Indonesia Sebagai Tuan Rumah Piala Dunia FIFA U-20 Menggunakan Naïve Bayes,” vol. 7, no. 3, pp. 1003–1012, 2023, doi: 10.30865/mib.v7i3.6144.
A. M. Ndapamuri, D. Manongga, and A. Iriani, “Analisis Sentimen Ulasan Aplikasi Tripadvisor Dengan Metode Support Vector Machine, K-Nearest Neighbor, Dan Naive Bayes,” INOVTEK Polbeng - Seri Inform., vol. 8, no. 1, p. 127, 2023, doi: 10.35314/isi.v8i1.3260.
D. Sandi, E. Utami, and K. Kusnawi, “Analisis Sentimen Publik Terhadap Elektabilitas Ganjar Pranowo di Tahun Politik 2024 di Twitter dengan Algoritma KNN dan Naïve Bayes,” J. Media …, vol. 7, pp. 1097–1108, 2023, doi: 10.30865/mib.v7i3.6298.
S. Khomsah, “Jurnal Penelitian Pos dan Informatika Naive Bayes Classifier Optimization on Sentiment Analysis of Hotel Reviews Optimasi Naive Bayes Classifier Pada Sentiment Analysis Komentar Pelanggan Hotel,” vol. 10, no. 2, pp. 157–168, 2020, doi: 10.17933/jppi.2020.100206.
W. Xing and Y. Bei, “Medical Health Big Data Classification Based on KNN Classification Algorithm,” IEEE Access, vol. 8, pp. 28808–28819, 2020, doi: 10.1109/ACCESS.2019.2955754.
D. Hernikawati, “Kecenderungan Tanggapan Masyarakat Terhadap Vaksin Sinovac Berdasarkan Lexicon Based Sentiment Analysis,” J. Ilmu Pengetah. dan Teknol. Komun., vol. 23, no. 1, pp. 21–31, 2021, [Online]. Available: http://dx.doi.org/10.33169/iptekkom.23.1.2021.21-31.
R. N. Ikhsani and F. F. Abdulloh, “Optimasi SVM dan Decision Tree Menggunakan SMOTE Untuk Mengklasifikasi Sentimen Masyarakat Mengenai Pinjaman Online,” vol. 7, pp. 1667–1677, 2023, doi: 10.30865/mib.v7i4.6809.
S. Sumayah, F. Sembiring, and W. Jatmiko, “Analysis of Sentiment of Indonesian Community on Metaverse Using Support Vector Machine Algorithm,” J. Tek. Inform., vol. 4, no. 1, pp. 143–150, 2023, doi: 10.52436/1.jutif.2023.4.1.417.
R. Puspitasari, Y. Findawati, M. A. Rosid, P. S. Informatika, and U. M. Sidoarjo, “Sentiment Analysis of Post-Covid-19 Inflation Based on Twitter Using the K-Nearest Neighbor and Support Vector Machine Analisis Sentimen Terhadap Inflasi Pasca Covid-19 Berdasarkan Twitter Dengan Metode Klasifikasi K-Nearest Neighbor Dan,” vol. 4, no. 4, pp. 1–11, 2023.
J. Muliawan, E. Dazki, and R. D. Kurniawan, “SENTIMENT ANALYSIS OF INDONESIA ’ S CAPITAL CITY RELOCATION USING THREE ALGORITHMS : NAÏVE BAYES , KNN , AND RANDOM FOREST ANALISIS SENTIMEN PEMINDAHAN IBU KOTA NEGARA INDONESIA MENGGUNAKAN TIGA ALGORITMA : NAÏVE BAYES , KNN , DAN RANDOM,” vol. 4, no. 5, pp. 1227–1236, 2023.
S. Supangat, M. Z. Bin Saringat, and M. Y. F. Rochman, “Predicting Handling Covid-19 Opinion using Naive Bayes and TF-IDF for Polarity Detection,” MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 22, no. 2, pp. 173–184, 2023, doi: 10.30812/matrik.v22i2.2227.
I. Alarab and S. Prakoonwit, “Effect of data resampling on feature importance in imbalanced blockchain data: Comparison studies of resampling techniques,” Data Sci. Manag., vol. 5, no. 2, pp. 66–76, 2022, doi: 10.1016/j.dsm.2022.04.003.
S. Ruan, H. Li, C. Li, and K. Song, “Class-specific deep feature weighting for naïve bayes text classifiers,” IEEE Access, vol. 8, pp. 20151–20159, 2020, doi: 10.1109/ACCESS.2020.2968984.
M. Asfi and N. Fitrianingsih, “Implementasi Algoritma Naive Bayes Classifier sebagai Sistem Rekomendasi Pembimbing Skripsi,” J. Nas. Inform. dan Teknol. Jar., vol. 5, pp. 45–50, 2020, [Online]. Available: https://jurnal.uisu.ac.id/index.php/infotekjar/article/view/2536.
S. Zhang and J. Li, “KNN Classification With One-Step Computation,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 3, pp. 2711–2723, 2023, doi: 10.1109/TKDE.2021.3119140.
H. Younes, A. Ibrahim, M. Rizk, and M. Valle, “An Efficient Selection-Based kNN Architecture for Smart Embedded Hardware Accelerators,” IEEE Open J. Circuits Syst., vol. 2, no. April, pp. 534–545, 2021, doi: 10.1109/ojcas.2021.3108835.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).