Perbandingan Metode Nae Bayes dan Support Vector Machine Pada Klasifikasi 22 Bahasa Daerah

 (*)Bima Rakajati Mail (Universitas Dian Nuswantoro, Semarang, Indonesia)
 Erwin Yudi Hidayat (Universitas Dian Nuswantoro, Semarang, Indonesia)

(*) Corresponding Author

Submitted: December 24, 2023; Published: January 10, 2024

Abstract

Indonesia boasts a rich cultural diversity, encompassing over 1300 ethnic groups and 2500 regional languages. The challenge arises due to the multitude of regional languages in Indonesia, making language identification in textual form difficult. This research compares Machine Learning methods for classifying 22 regional languages in Indonesia, aiming to provide a deep understanding of the relative performance of each method. The study successfully addresses the primary difficulty, which is the identification of regional languages in Indonesia. The main constraint of this research lies in the complexity of regional languages in Indonesia, with various characteristics, variations in grammar, and differing sentence structures, resulting in accuracy not yet reaching perfection. This factor opens opportunities for future research through parameter optimization or exploration of alternative methods. Evaluation results indicate that the Support Vector Machine achieves the highest accuracy, reaching 89.41%, making it the preferred choice for model implementation. Although Nae Bayes yields good results with an accuracy of 82.08%, Support Vector Machine remains the preferred option. The application of the model using Streamlit demonstrates the effectiveness of the Support Vector Machine in accurately predicting Javanese song lyrics. This research has the potential to assist users in identifying regional languages based on text and contributes significantly to understanding Machine Learning methods for classifying regional language texts. Despite its limitations, this study can be extended to other regional languages, enhancing model accuracy through parameter improvements.

Keywords



Full Text:

PDF


Article Metrics

Abstract view : 328 times
PDF - 94 times

References

E. Sapan Patasik and S. Yulianto, Classification of Regional Languages Using Methods Gradient Boots and Random Forest, Jurnal Teknik Informatika (JUTIF), vol. 4, no. 5, pp. 12491255, 2023, doi: 10.52436/1.jutif.2023.4.5.1459.

A. Naim and H. Syaputra, Kewarganegaraan, Suku Bangsa, agama, dan Bahasa Sehari-hari Penduduk Indonesia: Hasil Sensus Penduduk 2010. Jakarta, Indonesia: Badan Pusat Statistik, 2011. [Online]. Available: http://www.bps.go.id/website/pdf_publikasi/watermark%20_Kewarganegaraan,%20Suku%20Bangsa,%20Agama%20dan%20Bahasa_281211.pdf

D. Tuhenay and E. Mailoa, Perbandingan Klasifikasi Bahasa Menggunakan Metode Nave Bayes Classifier (NBC) Dan Support Vector Machine (SVM), JIKO (Jurnal Informatika dan Komputer), vol. 4, no. 2, 2021, doi: 10.33387/jiko.

Y. M. Mantri, Digitalisasi Bahasa Daerah Sebagai Upaya Meningkatkan Ketahanan Budaya Daerah, Journal TEXTURA, vol. 2, no. 2, pp. 6783, 2021.

D. Julianti and I. Siagian, Analisis Pengaruh Bahasa Daerah Terhadap Penggunaan Bahasa Indonesia, INNOVATIVE: Journal Of Social Science Research, vol. 3, no. 2, pp. 58295836, 2023, doi: 10.31004/innovative.v3i2.956.

F. Fathurrahman, M. M. Santoni, and A. Muliawati, Penerapan Artificial Neural Network Untuk Klasifikasi Citra Teks Dalam Penerjemahan Bahasa Daerah, Seminar Nasional Mahasiswa Ilmu Komputer dan Aplikasinya (SENAMIKA), 2020.

T. Jauhiainen, M. Lui, M. Zampieri, T. Baldwin, and K. Lindn, Automatic Language Identification in Texts: A Survey, Journal of Artificial Intelligence Research, Apr. 2018, [Online]. Available: http://arxiv.org/abs/1804.08186

R. Thoppilan et al., LaMDA: Language Models for Dialog Applications, CoRR, vol. abs/2201.08239, Jan. 2022, doi: 10.48550/arXiv.2201.08239.

Z. Abidin, Penerapan Neural Machine Translation untuk Eksperimen Penerjemahan secara Otomatis pada Bahasa Lampung Indonesia, Prosiding Seminar Nasional Metode Kuantitatif, pp. 5368, 2017, [Online]. Available: www.teknokrat.ac.id,

A. Luque, A. Carrasco, A. Martn, and A. de las Heras, The impact of class imbalance in classification performance metrics based on the binary confusion matrix, Pattern Recognit, vol. 91, pp. 216231, Jul. 2019, doi: 10.1016/j.patcog.2019.02.023.

A. Priyambodo and P. Prihati, Evaluasi Ekstraksi Fitur Klasifikasi Teks Untuk Peningkatan Akurasi Klasifikasi Menggunakan Naive Bayes, Elkom : Jurnal Elektronika dan Komputer, vol. 13, no. 1, pp. 159175, Jul. 2020, doi: 10.51903/elkom.v13i1.277.

D. Hatta Fudholi and K. Purnama Juwairi, Pemanfaatan Teknik Semi-Supervised Learning Untuk Klasifikasi Dokumen Medis, 2020. Accessed: Dec. 20, 2023. [Online]. Available: https://dspace.uii.ac.id/handle/123456789/20382?show=full

A. B. Nugraha and A. Romadhony, Identification of 10 Regional Indonesian Languages Using Machine Learning, sinkron, vol. 8, no. 4, pp. 22032214, Oct. 2023, doi: 10.33395/sinkron.v8i4.12989.

L. LUMBAA, Implementasi Metode SVM dan Gradiant Boost Dalam Klasifikasi Bahasa Daerah, JATISI (Jurnal Teknik Informatika dan Sistem Informasi), vol. 9, no. 2, pp. 908915, Jun. 2022, doi: 10.35957/jatisi.v9i2.1663.

G. M. Momole, Perbandingan Nave Bayes dan Random Forest Dalam Klasifikasi Bahasa Daerah, JATISI (Jurnal Teknik Informatika dan Sistem Informasi), vol. 9, no. 2, pp. 855863, Jun. 2022, doi: 10.35957/jatisi.v9i2.1857.

A. Babhulgaonkar and S. Sonavane, Language Identification for Multilingual Machine Translation, in 2020 International Conference on Communication and Signal Processing (ICCSP), IEEE, Jul. 2020, pp. 401405. doi: 10.1109/ICCSP48568.2020.9182184.

D. Halimah, M. Ridwan, L. Stikom, T. Bangsa, and W. Saputra, Algoritma C4.5 Untuk Menentukan Klasifikasi Tingkat Pemahaman Mahasiswa Pada Matakuliah Bahasa Pemrograman, Jurnal Teknik Mesin, Industri, Elektro Dan Informatika (JTMEI), vol. 1, no. 3, 2022, doi: 10.55606/jtmei.v1i3.534.

K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, Text Classification Algorithms: A Survey, Information, vol. 10, no. 4, p. 150, Apr. 2019, doi: 10.3390/info10040150.

A. F. Hidayatullah, A. Qazi, D. T. C. Lai, and R. A. Apong, A Systematic Review on Language Identification of Code-Mixed Text: Techniques, Data Availability, Challenges, and Framework Development, IEEE Access, vol. 10, pp. 122812122831, 2022, doi: 10.1109/ACCESS.2022.3223703.

V. Kandasamy et al., Sentimental Analysis of COVID-19 Related Messages in Social Networks by Involving an N-Gram Stacked Autoencoder Integrated in an Ensemble Learning Scheme, Sensors, vol. 21, no. 22, p. 7582, Nov. 2021, doi: 10.3390/s21227582.

S. Patil and V. Lokesha, Live Twitter Sentiment Analysis Using Streamlit Framework, SSRN Electronic Journal, 2022, doi: 10.2139/ssrn.4119949.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Perbandingan Metode Na´ve Bayes dan Support Vector Machine Pada Klasifikasi 22 Bahasa Daerah

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 JURNAL MEDIA INFORMATIKA BUDIDARMA

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.



JURNAL MEDIA INFORMATIKA BUDIDARMA
STMIK Budi Darma
Secretariat: Sisingamangaraja No. 338 Telp 061-7875998
Email: mib.stmikbd@gmail.com

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.