Perbandingan Algoritma Machine Learning Umum Berbasis TF-IDF untuk Klasifikasi Artikel Bahasa Indonesia
Kata Kunci:
Bernoulli Naïve Bayes, K-Nearest Neighbor, Machine Learning, Multinomial Naïve Bayes, News text classification.Abstrak
This study compares the performance of common machine learning algorithms in the classification of Indonesian news articles. A Dataset of 2160 articles from Detik.com was pre-processed and transformed into feature vectors using the Term Frequency-Inverse Document Frequency (TF-IDF) technique. The algorithms tested were Multinomial Naïve Bayes, Bernoulli Naïve Bayes, K-Nearest Neighbor, Random Forest and AdaBoost. Hyperparameter tuning was conducted using 5-fold cross-validation, and evaluation metrics included accuracy, precision, recall, and F1-score. The results indicate that Multinomial Naïve Bayes, with alpha set to 0.1, achieved the best overall performance with an accuracy of 0.8781, precision of 0.8138, recall of 0.8143, and F1-score of 0.814.Referensi
R. Wongso, F. A. Luwinda, B. C. Trisnajaya, O. Rusli and Rudy, "News Article Text Classification in Indonesian Language," Procedia Computer Science, vol. 116, pp. 137-143, 2017.
C. C. Yang, H. Chen and K. Hong, "Visualization of large category map for Internet browsing," Decis. Support Syst., vol. 35, pp. 89-102, 2003.
R. N. Devita, H. W. Herwanto and A. P. Wibawa, "Perbandingan Kinerja Metode Naive Bayes dan K-Nearest Neighbor untuk Klasifikasi Artikel Berbahasa Indonesia," Jurnal Teknologi Informasi dan Ilmu Komputer, 2018.
R. B. Afrianto and L. Y. Kurniawati, "Kategorisasi Dokumen Teks Secara Multi Label Menggunakan Fuzzy C-Means Dan k-nearest Neighbors Pada Artikel Berbahasa Indonesia," 2013.
B. H. Mahendra, "Kategorisasi Berita Multi-Label Berbahasa Indonesia Menggunakan Algoritma Random Forest," vol. 6, 2019.
I. M. G. Arimbawa and N. A. S. Er, "Penerapan Metode Adaboost Untuk Multi-label Classification Pada Dokumen Teks," vol. 9, pp. 127-140, 2020.
E. Y. Hidayat and M. A. Rizqi, "Klasifikasi Dokumen Berita Menggunakan Algoritma Enhanced Confix Stripping Stemmer dan Naïve Bayes Classifier," Journal Nasional Teknologi Dan Sistem Informasi, vol. 6, pp. 90-99, 2020.
I. C. Irsan and M. L. Khodra, "Hierarchical multi-label news article classification with distributed semantic model based features," International Journal of Advances in Intelligent Informatics, vol. 5, no. 1, pp. 40-47, 2019.
S. Patil, V. Lokesha and A. S. G., "Multi-Label News Category Text Classification," Journal of Algebraic Statistic, vol. 13, no. 3, pp. 5485-5498, 2022.
T. J. Watson, "An empirical study of the naive Bayes classifier," pp. 41-46, 2001.
A. Geron, in Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow, Canada, O'Reilly Media, Inc, 2019, pp. 263-267.
N. Rahmalia, "Berpengaruh pada Ranking Google dan SEO, Apa itu Stop Word?," 18 June 2021. [Online]. Available: https://glints.com/id/lowongan/stop-word-adalah/. [Accessed 18 10 2024].
"Scikit-Learn," [Online]. Available: https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.
TfidfTransformer.html. [Accessed 21 10 2024].
"Scikit-Learn," [Online]. Available: https://scikit-learn.org/1.5/modules/generated/sklearn.metrics.confusion_matrix.html#
sklearn.metrics.confusion_matrix. [Accessed 21 10 2024].
"Scikit-Learn," [Online]. Available: https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.BernoulliNB.html.
[Accessed 13 4 2025].
Scikit-Learn, "Scikit-Learn," [Online]. Available: https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes
.MultinomialNB.html.
[Accessed 13 4 2025].
Domino, "domino.ai," domino, [Online]. Available: https://domino.ai/data-science-dictionary/hyperparameter-tuning.
[Accessed 5 5 2025].
"Medium.com," [Online]. Available: https://medium.com/@ppraveen2150/different-types-to-data-imputation-techniques-e1d
c3702610.
"Medium.com," [Online]. Available: https://medium.com/almabetter/what-is-cross-validation-hyperparameter-tuning-426bd1
ea.
##submission.downloads##
Diterbitkan
Terbitan
Bagian
Lisensi
Hak Cipta (c) 2025 Andi Gunawan, Hendra Bunyamin

Artikel ini berlisensi Creative Commons Attribution-NonCommercial 4.0 International License.