A new weighting scheme and discriminative approach for information retrieval in static and dynamic document collections

This paper introduces a new weighting scheme in information retrieval. It also proposes using the document centroid as a threshold for normalizing documents in a document collection. Document centroid normalization helps to achieve more effective information retrieval as it enables good discriminati...

Full description

Bibliographic Details
Main Authors: Ibrahim, Osman A. S., Landa-Silva, Dario
Format: Conference or Workshop Item
Published: 2014
Subjects:
Online Access:https://eprints.nottingham.ac.uk/31329/
_version_ 1848794178392162304
author Ibrahim, Osman A. S.
Landa-Silva, Dario
author_facet Ibrahim, Osman A. S.
Landa-Silva, Dario
author_sort Ibrahim, Osman A. S.
building Nottingham Research Data Repository
collection Online Access
description This paper introduces a new weighting scheme in information retrieval. It also proposes using the document centroid as a threshold for normalizing documents in a document collection. Document centroid normalization helps to achieve more effective information retrieval as it enables good discrimination between documents. In the context of a machine learning application, namely unsupervised document indexing and retrieval, we compared the effectiveness of the proposed weighting scheme to the 'Term Frequency - Inverse Document Frequency' or TF-IDF, which is commonly used and considered as one of the best existing weighting schemes. The paper shows how the document centroid is used to remove less significant weights from documents and how this helps to achieve better retrieval effectiveness. Most of the existing weighting schemes in information retrieval research assume that the whole document collection is static. The results presented in this paper show that the proposed weighting scheme can produce higher retrieval effectiveness compared with the TF-IDF weighting scheme, in both static and dynamic document collections. The results also show the variation in information retrieval effectiveness that is achieved for static and dynamic document collections by using a specific weighting scheme. This type of comparison has not been presented in the literature before.
first_indexed 2025-11-14T19:12:04Z
format Conference or Workshop Item
id nottingham-31329
institution University of Nottingham Malaysia Campus
institution_category Local University
last_indexed 2025-11-14T19:12:04Z
publishDate 2014
recordtype eprints
repository_type Digital Repository
spelling nottingham-313292020-05-04T20:13:27Z https://eprints.nottingham.ac.uk/31329/ A new weighting scheme and discriminative approach for information retrieval in static and dynamic document collections Ibrahim, Osman A. S. Landa-Silva, Dario This paper introduces a new weighting scheme in information retrieval. It also proposes using the document centroid as a threshold for normalizing documents in a document collection. Document centroid normalization helps to achieve more effective information retrieval as it enables good discrimination between documents. In the context of a machine learning application, namely unsupervised document indexing and retrieval, we compared the effectiveness of the proposed weighting scheme to the 'Term Frequency - Inverse Document Frequency' or TF-IDF, which is commonly used and considered as one of the best existing weighting schemes. The paper shows how the document centroid is used to remove less significant weights from documents and how this helps to achieve better retrieval effectiveness. Most of the existing weighting schemes in information retrieval research assume that the whole document collection is static. The results presented in this paper show that the proposed weighting scheme can produce higher retrieval effectiveness compared with the TF-IDF weighting scheme, in both static and dynamic document collections. The results also show the variation in information retrieval effectiveness that is achieved for static and dynamic document collections by using a specific weighting scheme. This type of comparison has not been presented in the literature before. 2014-09 Conference or Workshop Item PeerReviewed Ibrahim, Osman A. S. and Landa-Silva, Dario (2014) A new weighting scheme and discriminative approach for information retrieval in static and dynamic document collections. In: 14th UK Workshop on Computational Intelligence (UKCI2014), 8-10 September 2014, Bradford, West Yorkshire, UK. information retrieval http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6930160&filter%3DAND%28p_IS_Number%3A6930143%29
spellingShingle information retrieval
Ibrahim, Osman A. S.
Landa-Silva, Dario
A new weighting scheme and discriminative approach for information retrieval in static and dynamic document collections
title A new weighting scheme and discriminative approach for information retrieval in static and dynamic document collections
title_full A new weighting scheme and discriminative approach for information retrieval in static and dynamic document collections
title_fullStr A new weighting scheme and discriminative approach for information retrieval in static and dynamic document collections
title_full_unstemmed A new weighting scheme and discriminative approach for information retrieval in static and dynamic document collections
title_short A new weighting scheme and discriminative approach for information retrieval in static and dynamic document collections
title_sort new weighting scheme and discriminative approach for information retrieval in static and dynamic document collections
topic information retrieval
url https://eprints.nottingham.ac.uk/31329/
https://eprints.nottingham.ac.uk/31329/