Pengelompokan Komentar Dataset Sentipol dengan Modified K-Means Clustering

Ruddy Cahyanto; Antonius Rachmat Chrismanto; Danny Sebastian

doi:10.28932/jutisi.v6i3.3006

PDF (English)

Diterbitkan: Dec 20, 2020

DOI: https://doi.org/10.28932/jutisi.v6i3.3006

Ruddy Cahyanto

Universitas Kristen Duta Wacana

Antonius Rachmat Chrismanto

Universitas Kristen Duta Wacana

Danny Sebastian

Abstrak

Clustering is a technique in data mining thatgroups data sets into similar data clusters. One of thealgorithms that is commonly used for clustering is K-Means.However, the K-Means algorithm has several weaknesses, oneof them is the random factor in initial centroid selection, sothat cluster result is inconsistent even though it is tested withthe exact same data. The Modified K-Means algorithm focuseson selecting the initial centroid to overcome inconsistencies ofcluster results in the K-Means method. The test was conductedusing sentipol dataset and only focused on comment data.Furthermore, the specified number of clusters is 3 based on thenumber of existing comment labels (positive, negative, andneutral). According to testing result proves that Modified KMeans algorithm produces better purity value than K-Meansalgorithm. Modified K-Means algorithm produces average ofpurity value 0,42, while K-Means produces average of purityvalue 0,391. Meanwhile, from testing related to random factorsconducted 5 times with the same attributes and test data, theresults of the cluster on the Modified K-Means algorithm didnot change, so automatically the resulting purity value was alsothe same. Whereas in the K-Means algorithm, the clusterresults always change in each test, so the result of purity valueis also likely to change.

Unduhan

Data unduhan belum tersedia.

Cara Mengutip

[1]

R. Cahyanto, A. R. Chrismanto, dan D. Sebastian, “Pengelompokan Komentar Dataset Sentipol dengan Modified K-Means Clustering”, JuTISI, vol. 6, no. 3, Des 2020.

Terbitan

Vol 6 No 3 (2020): JuTISI

Bagian

Articles

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial used, distribution and reproduction in any medium.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Bilah Samping Artikel

Isi Artikel Utama

Abstrak

Unduhan

Rincian Artikel

Artikel paling banyak dibaca berdasarkan penulis yang sama