Efficient document retrieval system using locality sensitive hashing nearest neighbor algorithm and weighted jaccard distance for retrieving closest personalities

The process of retrieving significant documents based on the search key from a corpus has been a vital research problem in the information retrieval field. This paper proposes an efficient way to retrieve documents related to different personalities extracted from Wikipedia. The proposed method util...

Full description

Bibliographic Details
Main Authors: SE. Ben Georgea, G. Jeba Rosline, N. Balasupramanian, N.R. Wilfred Blessing
Format: Article
Language:English
Published: Penerbit Universiti Kebangsaan Malaysia 2024
Online Access:http://journalarticle.ukm.my/25559/
http://journalarticle.ukm.my/25559/1/kejut_19.pdf
_version_ 1848816390105989120
author SE. Ben Georgea,
G. Jeba Rosline,
N. Balasupramanian,
N.R. Wilfred Blessing,
author_facet SE. Ben Georgea,
G. Jeba Rosline,
N. Balasupramanian,
N.R. Wilfred Blessing,
author_sort SE. Ben Georgea,
building UKM Institutional Repository
collection Online Access
description The process of retrieving significant documents based on the search key from a corpus has been a vital research problem in the information retrieval field. This paper proposes an efficient way to retrieve documents related to different personalities extracted from Wikipedia. The proposed method utilizes the Locality Sensitive Hashing Nearest Neighbor algorithm combined with Weighted Jaccard Distance to measure document similarity with enhanced precision. This document retrieval system demonstrates competitive performance compared to existing methods in the Personality Identification domain. The introduction of a document centroid normalization technique significantly improves the effectiveness of information retrieval by enabling better discrimination between documents. The personality document search results were compared for different distance measures using performance metrics like Normalized Discounted Cumulative Gain and Mean Average Precision. The results presented in this paper show that the TF-IDF scheme with Locality Sensitive Hashing Nearest Neighbor Algorithm using the Weighted Jaccard Distance can yield superior retrieval efficiency when contrasted with alternative approaches found in the existing literature.
first_indexed 2025-11-15T01:05:06Z
format Article
id oai:generic.eprints.org:25559
institution Universiti Kebangasaan Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T01:05:06Z
publishDate 2024
publisher Penerbit Universiti Kebangsaan Malaysia
recordtype eprints
repository_type Digital Repository
spelling oai:generic.eprints.org:255592025-07-14T08:21:27Z http://journalarticle.ukm.my/25559/ Efficient document retrieval system using locality sensitive hashing nearest neighbor algorithm and weighted jaccard distance for retrieving closest personalities SE. Ben Georgea, G. Jeba Rosline, N. Balasupramanian, N.R. Wilfred Blessing, The process of retrieving significant documents based on the search key from a corpus has been a vital research problem in the information retrieval field. This paper proposes an efficient way to retrieve documents related to different personalities extracted from Wikipedia. The proposed method utilizes the Locality Sensitive Hashing Nearest Neighbor algorithm combined with Weighted Jaccard Distance to measure document similarity with enhanced precision. This document retrieval system demonstrates competitive performance compared to existing methods in the Personality Identification domain. The introduction of a document centroid normalization technique significantly improves the effectiveness of information retrieval by enabling better discrimination between documents. The personality document search results were compared for different distance measures using performance metrics like Normalized Discounted Cumulative Gain and Mean Average Precision. The results presented in this paper show that the TF-IDF scheme with Locality Sensitive Hashing Nearest Neighbor Algorithm using the Weighted Jaccard Distance can yield superior retrieval efficiency when contrasted with alternative approaches found in the existing literature. Penerbit Universiti Kebangsaan Malaysia 2024-07 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/25559/1/kejut_19.pdf SE. Ben Georgea, and G. Jeba Rosline, and N. Balasupramanian, and N.R. Wilfred Blessing, (2024) Efficient document retrieval system using locality sensitive hashing nearest neighbor algorithm and weighted jaccard distance for retrieving closest personalities. Jurnal Kejuruteraan, 36 (4). pp. 1535-1543. ISSN 0128-0198 https://www.ukm.my/jkukm/volume-3604-2024/
spellingShingle SE. Ben Georgea,
G. Jeba Rosline,
N. Balasupramanian,
N.R. Wilfred Blessing,
Efficient document retrieval system using locality sensitive hashing nearest neighbor algorithm and weighted jaccard distance for retrieving closest personalities
title Efficient document retrieval system using locality sensitive hashing nearest neighbor algorithm and weighted jaccard distance for retrieving closest personalities
title_full Efficient document retrieval system using locality sensitive hashing nearest neighbor algorithm and weighted jaccard distance for retrieving closest personalities
title_fullStr Efficient document retrieval system using locality sensitive hashing nearest neighbor algorithm and weighted jaccard distance for retrieving closest personalities
title_full_unstemmed Efficient document retrieval system using locality sensitive hashing nearest neighbor algorithm and weighted jaccard distance for retrieving closest personalities
title_short Efficient document retrieval system using locality sensitive hashing nearest neighbor algorithm and weighted jaccard distance for retrieving closest personalities
title_sort efficient document retrieval system using locality sensitive hashing nearest neighbor algorithm and weighted jaccard distance for retrieving closest personalities
url http://journalarticle.ukm.my/25559/
http://journalarticle.ukm.my/25559/
http://journalarticle.ukm.my/25559/1/kejut_19.pdf