BESKlus : BERT Extractive Summarization with K-Means Clustering in Scientific Paper

Feliks Victor Parningotan Samosir; Hapnes Toba; Mewati Ayub

doi:10.28932/jutisi.v8i1.4474

PDF

Published: Apr 29, 2022

DOI: https://doi.org/10.28932/jutisi.v8i1.4474

Feliks Victor Parningotan Samosir

Universitas Kristen Maranatha

Hapnes Toba

Mewati Ayub

Abstract

This study aims to propose methods and models for extractive text summarization with contextual embedding. To build this model, a combination of traditional machine learning algorithms such as K-Means Clustering and the latest BERT-based architectures such as Sentence-BERT (SBERT) is carried out. The contextual embedding process will be carried out at the sentence level by SBERT. Embedded sentences will be clustered and the distance calculated from the centroid. The top sentences from each cluster will be used as summary candidates. The dataset used in this study is a collection of scientific journals from NeurIPS. Performance evaluation carried out with ROUGE-L gave a result of 15.52% and a BERTScore of 85.55%. This result surpasses several previous models such as PyTextRank and BERT Extractive Summarizer. The results of these measurements prove that the use of contextual embedding is very good if applied to extractive text summarization which is generally done at the sentence level.

Downloads

Download data is not yet available.

How to Cite

[1]

F. V. P. Samosir, H. Toba, and M. Ayub, “BESKlus : BERT Extractive Summarization with K-Means Clustering in Scientific Paper”, JuTISI, vol. 8, no. 1, pp. 202 –, Apr. 2022.

Issue

Vol. 8 No. 1 (2022): JuTISI

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial used, distribution and reproduction in any medium.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Most read articles by the same author(s)

Joseph Sanjaya, Mewati Ayub, Augmentasi Data Pengenalan Citra Mobil Menggunakan Pendekatan Random Crop, Rotate, dan Mixup , Jurnal Teknik Informatika dan Sistem Informasi: Vol. 6 No. 2 (2020): JuTISI
Feliks Victor Parningotan Samosir, Loudry Palmarums Mustamu, Erik Dwi Anggara, Albertus Indarko Wiyogo, Andreas Widjaja, Exploratory Data Analysis terhadap Kepadatan Penumpang Kereta Rel Listrik , Jurnal Teknik Informatika dan Sistem Informasi: Vol. 7 No. 2 (2021): JuTISI
Joseph Sanjaya, Erick Renata, Vincent Elbert Budiman, Francis Anderson, Mewati Ayub, Prediksi Kelalaian Pinjaman Bank Menggunakan Random Forest dan Adaptive Boosting , Jurnal Teknik Informatika dan Sistem Informasi: Vol. 6 No. 1 (2020): JuTISI
Ivan Nathaniel Husada, Hapnes Toba, Pengaruh Metode Penyeimbangan Kelas Terhadap Tingkat Akurasi Analisis Sentimen pada Tweets Berbahasa Indonesia , Jurnal Teknik Informatika dan Sistem Informasi: Vol. 6 No. 2 (2020): JuTISI
Francis Anderson Kojongian, Mewati Ayub, Manajemen Risiko Divisi Sistem Informasi Perguruan Tinggi Dengan Framework COBIT 5 , Jurnal Teknik Informatika dan Sistem Informasi: Vol. 7 No. 1 (2021): JuTISI
Ivan Nathaniel Husada, Edward Hanafi Fernando, Hetthroh Sagala, Ariel Elbert Budiman, Hapnes Toba, Ekstraksi dan Analisis Produk di Marketplace Secara Otomatis dengan Memanfaatkan Teknologi Web Crawling , Jurnal Teknik Informatika dan Sistem Informasi: Vol. 5 No. 3 (2019): JuTISI
Sendy Ferdian, Tjatur Kandaga, Andreas Widjaja, Hapnes Toba, Ronaldo Joshua, Julio Narabel, Continuous Integration and Continuous Delivery Platform Development of Software Engineering and Software Project Management in Higher Education , Jurnal Teknik Informatika dan Sistem Informasi: Vol. 7 No. 1 (2021): JuTISI
Try Atmaja Linggan Jaya, Mewati Ayub, Pengembangan Knowledge Management System dengan Teknik Information Retrieval , Jurnal Teknik Informatika dan Sistem Informasi: Vol. 7 No. 1 (2021): JuTISI
Chandra Ari Gunawan, Hapnes Toba, Pembangkitan Solusi Penjadwalan Berprioritas Melalui Penerapan Constraint Satisfaction Problem (Studi Kasus: Laboratorium Fakultas Teknologi Informasi Universitas XXX) , Jurnal Teknik Informatika dan Sistem Informasi: Vol. 2 No. 1 (2016): JuTISI
Budi Wibowo Suhanjoyo, Hapnes Toba, Bernard Renaldy Suteja, Fraud Detection in Sales of Distribution Companies Using Machine Learning , Jurnal Teknik Informatika dan Sistem Informasi: Vol. 9 No. 2 (2023): JuTISI

1 2 3 > >>

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

Most read articles by the same author(s)