Analisis Perbandingan Performa NMF dengan LDA pada Topik Modeling Berita Online Indonesia
DOI:
https://doi.org/10.30865/json.v7i3.9469Keywords:
Topic Modelling, Non-negative Matrix Factorization, Latent Dirichlet Allocation, Natural Language Processing, Analisis BeritaAbstract
Pertumbuhan konten berita digital di Indonesia menciptakan kebutuhan akan metode otomatis untuk mengekstraksi topik-topik utama dari dataset teks berita berskala besar. Penelitian ini melakukan analisis komparatif performa Non-negative Matrix Factorization (NMF) dan Latent Dirichlet Allocation (LDA) dalam tugas topic modeling berita online Indonesia dari tiga media: CNBC Indonesia, Kompas.com, dan Detik.com. Dataset terdiri dari 4.500 artikel berita dengan preprocessing meliputi tokenisasi, penghapusan stopwords, serta ekstraksi fitur menggunakan TF-IDF untuk NMF dan Count Vectorizer untuk LDA. Evaluasi performa dilakukan menggunakan coherence score (Cᵥ), topic diversity, silhouette score, dan uji chi-square untuk distribusi topik antar media. Hasil menunjukkan bahwa NMF memiliki nilai coherence lebih tinggi (0.7544) dibandingkan LDA (0.5600), topic diversity yang lebih baik (0.9400 vs 0.8400), serta efisiensi waktu training yang lebih tinggi (1.60 detik vs 108.30 detik). Uji chi-square mengonfirmasi perbedaan signifikan (p < 0.001) dalam distribusi topik antar media. Berdasarkan hasil evaluasi pada dataset yang digunakan, NMF menunjukkan performa yang lebih baik dibandingkan LDA dalam konteks topic modeling berita online Indonesia.
References
R. Egger and J. Yu, “A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts,” Front. Sociol., vol. 7, no. May, pp. 1–16, 2022, doi: 10.3389/fsoc.2022.886498.
M. Grootendorst, “BERTopic: Neural topic modeling with a class-based TF-IDF procedure,” 2022, [Online]. Available: http://arxiv.org/abs/2203.05794
E. Puspita, D. F. Shiddieq, and F. F. Roji, “Pemodelan Topik pada Media Berita Online Menggunakan Latent Dirichlet Allocation (Studi Kasus Merek Somethinc),” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 4, no. 2, pp. 481–489, 2024, doi: 10.57152/malcom.v4i2.1204.
A. Ikegami and I. D. M. B. A. Darmawan, “Analisis Sentimen dan Pemodelan Topik Ulasan Aplikasi Noice Menggunakan XGBoost dan LDA,” Jnatia, vol. 1, no. 1, pp. 325–336, 2022.
D. Ridhwanulah and D. H. Fudholi, “Pemodelan Topik pada Cuitan tentang Penyakit Tropis di Indonesia dengan Metode Latent Dirichlet Allocation,” J. Ilm. SINUS, vol. 20, no. 1, p. 11, Jan. 2022, doi: 10.30646/sinus.v20i1.589.
N. A. Sanjaya ER, “Implementasi Latent Dirichlet Allocation (LDA) untuk Klasterisasi Cerita Berbahasa Bali,” J. Teknol. Inf. dan Ilmu Komput., vol. 8, no. 1, p. 127, 2021, doi: 10.25126/jtiik.0813556.
O. Ozyurt, H. Özköse, and A. Ayaz, “Evaluating the latest trends of Industry 4.0 based on LDA topic model,” J. Supercomput., vol. 80, no. 13, pp. 19003–19030, 2024, doi: 10.1007/s11227-024-06247-x.
R. P. F. Afidh and Syahrial, “Pemodelan Topik Menggunakan n-Gram dan Non-negative Matrix Factorization,” J. Inf. dan Teknol., vol. 5, no. 1, pp. 265–275, 2023, doi: 10.60083/jidt.v5i1.385.
F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” COLING 2020 - 28th Int. Conf. Comput. Linguist. Proc. Conf., pp. 757–770, 2020, doi: 10.18653/v1/2020.coling-main.66.
U. Khairani, V. Mutiawani, and H. Ahmadian, “Pengaruh Tahapan Preprocessing Terhadap Model Indobert Dan Indobertweet Untuk Mendeteksi Emosi Pada Komentar Akun Berita Instagram,” J. Teknol. Inf. dan Ilmu Komput., vol. 11, no. 4, pp. 887–894, 2024, doi: 10.25126/jtiik.1148315.
A. Nanyonga, H. Wasswa, and G. Wild, “Topic Modeling Analysis of Aviation Accident Reports: A Comparative Study between LDA and NMF Models,” 2023 3rd Int. Conf. Smart Gener. Comput. Commun. Networking, SMART GENCON 2023, 2023, doi: 10.1109/SMARTGENCON60755.2023.10442471.
O. Babalola, B. Ojokoh, and O. Boyinbode, “Comprehensive Evaluation of LDA, NMF, and BERTopic’s Performance on News Headline Topic Modeling,” J. Comput. Theor. Appl., vol. 2, no. 2, pp. 268–289, 2024, doi: 10.62411/jcta.11635.
M. S. Khine, Text Mining in Educational Research: Topic Modeling and Latent Dirichlet Allocation. Singapore: Springer Nature Singapore, 2025. doi: 10.1007/978-981-97-7858-4.
J. H. Jurafsky, Daniel; Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Stanford NLP Group, 2026. [Online]. Available: https://web.stanford.edu/~jurafsky/slp3/
Maulidya Prastita Syah, Ajeng Puspa Wardani, Mohammad Idhom, and Trimono, “Perbandingan Representasi Teks Tf-Idf Dan Bert Terhadap Akurasi Cosine Similarity Dalam Penilaian Otomatis Jawaban Berbasis Teks,” Data Sci. Indones., vol. 5, no. 1, pp. 47–59, 2025, doi: 10.47709/dsi.v5i1.6021.
D. Patel, V. Parikh, O. Patel, A. Shah, and B. Chaudhury, “Exploring Topic Trends in COVID-19 Research Literature using Non-Negative Matrix Factorization,” IEEE Trans. Artif. Intell., pp. 1–29, 2025, doi: 10.1109/TAI.2025.3579459.
Pavithra and Savitha, “Topic Modeling for Evolving Textual Data Using LDA, HDP, NMF, BERTOPIC, and DTM With a Focus on Research Papers,” J. Technol. Informatics, vol. 5, no. 2, pp. 53–63, 2024, doi: 10.37802/joti.v5i2.618.
J. Gan and Y. Qi, “Selection of the optimal number of topics for LDA topic model—Taking patent policy analysis as an example,” Entropy, vol. 23, no. 10, 2021, doi: 10.3390/e23101301.
A. Farea, S. Tripathi, G. Glazko, and F. Emmert-Streib, “Investigating the optimal number of topics by advanced text-mining techniques: Sustainable energy research,” Eng. Appl. Artif. Intell., vol. 136, p. 108877, Oct. 2024, doi: 10.1016/J.ENGAPPAI.2024.108877.
S. Mohammed et al., “The effects of data quality on machine learning performance on tabular data,” Inf. Syst., vol. 132, p. 102549, Jul. 2025, doi: 10.1016/J.IS.2025.102549.
Y. O. Odhianto, D. Swanjaya, and J. Sahertian, “Optimalisasi Latent Dirichlet Allocation untuk Ekstraksi Topik Utama dalam Teks Dongeng,” Semin. Nas. Inov. Teknol., vol. 9, p. 2025, 2025, [Online]. Available: https://proceeding.unpkediri.ac.id/index.php/inotek/article/view/7712
P. Malviya, V. Bhandari, P. S. Sisodiya, and S. Suman, “Customer Segmentation and Business Sales Forecasting using Machine Learning for Business Development,” Int. J. Recent Innov. Trends Comput. Commun., vol. 11, no. 11s, pp. 416–424, Oct. 2023, doi: 10.17762/IJRITCC.V11I11S.8170.
C. Meaney, T. A. Stukel, P. C. Austin, R. Moineddin, M. Greiver, and M. Escobar, “Quality indices for topic model selection and evaluation: a literature review and case study,” BMC Med. Inform. Decis. Mak., vol. 23, no. 1, pp. 1–18, 2023, doi: 10.1186/s12911-023-02216-1.
H. Rahimi, D. Mimno, J. L. Hoover, H. Naacke, C. Constantin, and B. Amann, “Contextualized Topic Coherence Metrics,” EACL 2024 - 18th Conf. Eur. Chapter Assoc. Comput. Linguist. Find. EACL 2024, pp. 1760–1773, 2024.
Y. Bu, M. Li, W. Gu, and W. bin Huang, “Topic diversity: A discipline scheme-free diversity measurement for journals,” J. Assoc. Inf. Sci. Technol., vol. 72, no. 5, pp. 523–539, May 2021, doi: 10.1002/ASI.24433;REQUESTEDJOURNAL:JOURNAL:23301643;WGROUP:STRING:PUBLICATION.
S. Terragni, E. Fersini, B. Galuzzi, P. Tropeano, and A. Candelieri, “OCTIS: Comparing and Optimizing Topic Models is Simple!,” pp. 263–270, Accessed: Jan. 02, 2026. [Online]. Available: http://people.csail.mit.edu/jrennie/
M. Shutaywi and N. N. Kachouie, “Silhouette Analysis for Performance Evaluation in Machine Learning with Applications to Clustering,” Entropy 2021, Vol. 23, Page 759, vol. 23, no. 6, p. 759, Jun. 2021, doi: 10.3390/E23060759.
D. Chicco, A. Campagner, A. Spagnolo, D. Ciucci, and G. Jurman, “The Silhouette coefficient and the Davies-Bouldin index are more informative than Dunn index, Calinski-Harabasz index, Shannon entropy, and Gap statistic for unsupervised clustering internal evaluation of two convex clusters,” PeerJ Comput. Sci., vol. 11, p. e3309, Nov. 2025, doi: 10.7717/PEERJ-CS.3309.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Jurnal Sistem Komputer dan Informatika (JSON)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).

