R-tfidf, A variety of TF-IDF term weighting strategy in document categorization

Term weighting strategy plays an essential role in the areas related to text processing such as text categorization and information retrieval. In such systems, term frequency, inverse document frequency, and document length normalization are important factors to be considered when a term weighting s...

Full description

Bibliographic Details
Main Authors: Zhu, Dengya, Xiao, J.
Format: Conference Paper
Published: 2011
Online Access:http://hdl.handle.net/20.500.11937/53558
_version_ 1848759173273092096
author Zhu, Dengya
Xiao, J.
author_facet Zhu, Dengya
Xiao, J.
author_sort Zhu, Dengya
building Curtin Institutional Repository
collection Online Access
description Term weighting strategy plays an essential role in the areas related to text processing such as text categorization and information retrieval. In such systems, term frequency, inverse document frequency, and document length normalization are important factors to be considered when a term weighting strategy is developed. Term length normalization is proposed to give equal opportunities to retrieve both lengthy documents and shorter ones. However, terms in very short documents that may be useless for users, especially in the scenario of Web information retrieval, could be assigned very high weights, resulting in a situation where shorter documents are ranked higher than lengthy documents that are more relevant to users information needs. In this research, a new R-tfidf term weighting strategy is proposed to alleviate the side effects of document length normalization. Experimental results demonstrate the proposed approach can to some extent improve the performance of text categorization. © 2011 IEEE.
first_indexed 2025-11-14T09:55:40Z
format Conference Paper
id curtin-20.500.11937-53558
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T09:55:40Z
publishDate 2011
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-535582017-09-13T15:46:59Z R-tfidf, A variety of TF-IDF term weighting strategy in document categorization Zhu, Dengya Xiao, J. Term weighting strategy plays an essential role in the areas related to text processing such as text categorization and information retrieval. In such systems, term frequency, inverse document frequency, and document length normalization are important factors to be considered when a term weighting strategy is developed. Term length normalization is proposed to give equal opportunities to retrieve both lengthy documents and shorter ones. However, terms in very short documents that may be useless for users, especially in the scenario of Web information retrieval, could be assigned very high weights, resulting in a situation where shorter documents are ranked higher than lengthy documents that are more relevant to users information needs. In this research, a new R-tfidf term weighting strategy is proposed to alleviate the side effects of document length normalization. Experimental results demonstrate the proposed approach can to some extent improve the performance of text categorization. © 2011 IEEE. 2011 Conference Paper http://hdl.handle.net/20.500.11937/53558 10.1109/SKG.2011.44 restricted
spellingShingle Zhu, Dengya
Xiao, J.
R-tfidf, A variety of TF-IDF term weighting strategy in document categorization
title R-tfidf, A variety of TF-IDF term weighting strategy in document categorization
title_full R-tfidf, A variety of TF-IDF term weighting strategy in document categorization
title_fullStr R-tfidf, A variety of TF-IDF term weighting strategy in document categorization
title_full_unstemmed R-tfidf, A variety of TF-IDF term weighting strategy in document categorization
title_short R-tfidf, A variety of TF-IDF term weighting strategy in document categorization
title_sort r-tfidf, a variety of tf-idf term weighting strategy in document categorization
url http://hdl.handle.net/20.500.11937/53558