An empirical study of feature selection for text categorization based on term weightage

This paper proposes a local feature selection (FS) measure namely, Categorical Descriptor Term (CTD) for text categorization. It is derived based on classic term weighting scheme, TFIDF. The method explicitly chooses feature set for each category by only selecting set of terms from relevant category...

Full description

Bibliographic Details
Main Authors: Bong, Chih How, Kulathuramaiyer, Narayanan
Format: Proceeding
Language:English
Published: Universiti Malaysia Sarawak, (UNIMAS) 2004
Subjects:
Online Access:http://ir.unimas.my/id/eprint/1190/
http://ir.unimas.my/id/eprint/1190/1/An%2Bemperical%2Bstudy%2Bof%2Bfeature%2Bselection%2Bfor%2BTEXT%2Bcategorization%2Bbased%2Bon%2Bterm%2Bweightage%2B%2528%2Babstract%2529.pdf
_version_ 1848834708848246784
author Bong, Chih How
Kulathuramaiyer, Narayanan
author_facet Bong, Chih How
Kulathuramaiyer, Narayanan
author_sort Bong, Chih How
building UNIMAS Institutional Repository
collection Online Access
description This paper proposes a local feature selection (FS) measure namely, Categorical Descriptor Term (CTD) for text categorization. It is derived based on classic term weighting scheme, TFIDF. The method explicitly chooses feature set for each category by only selecting set of terms from relevant category. Although past literatures have suggested that the use of features from irrelevant categories can improve the measure of text categorization, we believe that by incorporating only relevant feature can be highly effective. The experimental comparison is carried out between CTD and five wellknown feature selection measures: Information Gain, Chi-Square, Correlation Coefficient, Odd Ratio and GSS Coefficient. The results also show that our proposed method can perform comparatively well with other FS measures, especially on collection with highly overlapped topics.
first_indexed 2025-11-15T05:56:17Z
format Proceeding
id unimas-1190
institution Universiti Malaysia Sarawak
institution_category Local University
language English
last_indexed 2025-11-15T05:56:17Z
publishDate 2004
publisher Universiti Malaysia Sarawak, (UNIMAS)
recordtype eprints
repository_type Digital Repository
spelling unimas-11902015-03-24T01:04:36Z http://ir.unimas.my/id/eprint/1190/ An empirical study of feature selection for text categorization based on term weightage Bong, Chih How Kulathuramaiyer, Narayanan AC Collections. Series. Collected works T Technology (General) This paper proposes a local feature selection (FS) measure namely, Categorical Descriptor Term (CTD) for text categorization. It is derived based on classic term weighting scheme, TFIDF. The method explicitly chooses feature set for each category by only selecting set of terms from relevant category. Although past literatures have suggested that the use of features from irrelevant categories can improve the measure of text categorization, we believe that by incorporating only relevant feature can be highly effective. The experimental comparison is carried out between CTD and five wellknown feature selection measures: Information Gain, Chi-Square, Correlation Coefficient, Odd Ratio and GSS Coefficient. The results also show that our proposed method can perform comparatively well with other FS measures, especially on collection with highly overlapped topics. Universiti Malaysia Sarawak, (UNIMAS) 2004 Proceeding NonPeerReviewed text en http://ir.unimas.my/id/eprint/1190/1/An%2Bemperical%2Bstudy%2Bof%2Bfeature%2Bselection%2Bfor%2BTEXT%2Bcategorization%2Bbased%2Bon%2Bterm%2Bweightage%2B%2528%2Babstract%2529.pdf Bong, Chih How and Kulathuramaiyer, Narayanan (2004) An empirical study of feature selection for text categorization based on term weightage. In: 2004 IEEE/WIC/ACM International Conference on Web Intelligence.
spellingShingle AC Collections. Series. Collected works
T Technology (General)
Bong, Chih How
Kulathuramaiyer, Narayanan
An empirical study of feature selection for text categorization based on term weightage
title An empirical study of feature selection for text categorization based on term weightage
title_full An empirical study of feature selection for text categorization based on term weightage
title_fullStr An empirical study of feature selection for text categorization based on term weightage
title_full_unstemmed An empirical study of feature selection for text categorization based on term weightage
title_short An empirical study of feature selection for text categorization based on term weightage
title_sort empirical study of feature selection for text categorization based on term weightage
topic AC Collections. Series. Collected works
T Technology (General)
url http://ir.unimas.my/id/eprint/1190/
http://ir.unimas.my/id/eprint/1190/1/An%2Bemperical%2Bstudy%2Bof%2Bfeature%2Bselection%2Bfor%2BTEXT%2Bcategorization%2Bbased%2Bon%2Bterm%2Bweightage%2B%2528%2Babstract%2529.pdf