Pembentukan Dataset Topik Kata Bahasa Indonesia pada Twitter Menggunakan TF-IDF & Cosine Similarity

Main Article Content

Kristian Adi Nugraha
Danny Sebastian

Abstract

Social media is evidently the most popular platform compared to other web applications. Indonesians spend an average of 3 hours and 15 minutes every day to access social media, resulting in a substantial amount of information flow. Even though research on information retrieval with social media data is common, only an inconsiderable amount concentrate using social media content in the Indonesian language. Our research aims to form an Indonesian language topic dataset using social media data from Twitter. The methods used in this research include TF-IDF for data formation and cosine similarity to group the Twitter data. Based on the test we conducted, our system is able to produce a fairly accurate result with 64% as its most optimal percentage for the process of every 200 Tweets.

Downloads

Download data is not yet available.

Article Details

How to Cite
[1]
K. A. Nugraha and D. Sebastian, “Pembentukan Dataset Topik Kata Bahasa Indonesia pada Twitter Menggunakan TF-IDF & Cosine Similarity”, JuTISI, vol. 4, no. 3, pp. 376–386, Dec. 2018.
Section
Articles

Most read articles by the same author(s)