Extended Vector Space Model with Semantic Relatedness on Java Archive Search Engine
Isi Artikel Utama
Abstrak
Byte code as information source is a novel approach which enable Java archive search engine to be built without relying on another resources except the Java archive itself [1]. Unfortunately, its effectiveness is not considerably high since some relevant documents may not be retrieved because of vocabulary mismatch. In this research, a vector space model (VSM) is extended with semantic relatedness to overcome vocabulary mismatch issue in Java archive search engine. Aiming the most effective retrieval model, some sort of equations in retrieval models are also proposed and evaluated such as sum up all related term, substituting non-existing term with most related term, logaritmic normalization, context-specific relatedness, and low-rank query-related retrieved documents. In general, semantic relatedness improves recall as a tradeoff of its precision reduction. We also proposed a scheme to take the advantage of relatedness without affected by its disadvantage (VSM + considering non-retrieved documents as low-rank retrieved documents using semantic relatedness). This scheme assures that relatedness score should be ranked lower than standard exact-match score. This scheme yields 1.754% higher effectiveness than our standard VSM.
Unduhan
Data unduhan belum tersedia.
Rincian Artikel
Cara Mengutip
[1]
O. Karnalim, “Extended Vector Space Model with Semantic Relatedness on Java Archive Search Engine”, JuTISI, vol. 1, no. 2, Agu 2015.
Terbitan
Bagian
Articles
This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial used, distribution and reproduction in any medium.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.