An empirical study of feature selection for text categorization based on term weightage

This paper proposes a local feature selection (FS) measure namely, Categorical Descriptor Term (CTD) for text categorization. It is derived based on classic term weighting scheme, TFIDF. The method explicitly chooses feature set for each category by only selecting set of terms from relevant category...

Full description

Bibliographic Details
Main Authors: Bong, Chih How, Kulathuramaiyer, Narayanan
Format: Conference or Workshop Item
Language:English
Published: Universiti Malaysia Sarawak, (UNIMAS) 2004
Subjects:
Online Access:http://ir.unimas.my/1190/
http://ir.unimas.my/1190/1/An%2Bemperical%2Bstudy%2Bof%2Bfeature%2Bselection%2Bfor%2BTEXT%2Bcategorization%2Bbased%2Bon%2Bterm%2Bweightage%2B%2528%2Babstract%2529.pdf
id unimas-1190
recordtype eprints
spelling unimas-11902015-03-24T01:04:36Z http://ir.unimas.my/1190/ An empirical study of feature selection for text categorization based on term weightage Bong, Chih How Kulathuramaiyer, Narayanan AC Collections. Series. Collected works T Technology (General) This paper proposes a local feature selection (FS) measure namely, Categorical Descriptor Term (CTD) for text categorization. It is derived based on classic term weighting scheme, TFIDF. The method explicitly chooses feature set for each category by only selecting set of terms from relevant category. Although past literatures have suggested that the use of features from irrelevant categories can improve the measure of text categorization, we believe that by incorporating only relevant feature can be highly effective. The experimental comparison is carried out between CTD and five wellknown feature selection measures: Information Gain, Chi-Square, Correlation Coefficient, Odd Ratio and GSS Coefficient. The results also show that our proposed method can perform comparatively well with other FS measures, especially on collection with highly overlapped topics. Universiti Malaysia Sarawak, (UNIMAS) 2004 Conference or Workshop Item NonPeerReviewed text en http://ir.unimas.my/1190/1/An%2Bemperical%2Bstudy%2Bof%2Bfeature%2Bselection%2Bfor%2BTEXT%2Bcategorization%2Bbased%2Bon%2Bterm%2Bweightage%2B%2528%2Babstract%2529.pdf Bong, Chih How and Kulathuramaiyer, Narayanan (2004) An empirical study of feature selection for text categorization based on term weightage. In: 2004 IEEE/WIC/ACM International Conference on Web Intelligence.
repository_type Digital Repository
institution_category Local University
institution Universiti Malaysia Sarawak
building UNIMAS Institutional Repository
collection Online Access
language English
topic AC Collections. Series. Collected works
T Technology (General)
spellingShingle AC Collections. Series. Collected works
T Technology (General)
Bong, Chih How
Kulathuramaiyer, Narayanan
An empirical study of feature selection for text categorization based on term weightage
description This paper proposes a local feature selection (FS) measure namely, Categorical Descriptor Term (CTD) for text categorization. It is derived based on classic term weighting scheme, TFIDF. The method explicitly chooses feature set for each category by only selecting set of terms from relevant category. Although past literatures have suggested that the use of features from irrelevant categories can improve the measure of text categorization, we believe that by incorporating only relevant feature can be highly effective. The experimental comparison is carried out between CTD and five wellknown feature selection measures: Information Gain, Chi-Square, Correlation Coefficient, Odd Ratio and GSS Coefficient. The results also show that our proposed method can perform comparatively well with other FS measures, especially on collection with highly overlapped topics.
format Conference or Workshop Item
author Bong, Chih How
Kulathuramaiyer, Narayanan
author_facet Bong, Chih How
Kulathuramaiyer, Narayanan
author_sort Bong, Chih How
title An empirical study of feature selection for text categorization based on term weightage
title_short An empirical study of feature selection for text categorization based on term weightage
title_full An empirical study of feature selection for text categorization based on term weightage
title_fullStr An empirical study of feature selection for text categorization based on term weightage
title_full_unstemmed An empirical study of feature selection for text categorization based on term weightage
title_sort empirical study of feature selection for text categorization based on term weightage
publisher Universiti Malaysia Sarawak, (UNIMAS)
publishDate 2004
url http://ir.unimas.my/1190/
http://ir.unimas.my/1190/1/An%2Bemperical%2Bstudy%2Bof%2Bfeature%2Bselection%2Bfor%2BTEXT%2Bcategorization%2Bbased%2Bon%2Bterm%2Bweightage%2B%2528%2Babstract%2529.pdf
first_indexed 2018-09-06T14:41:51Z
last_indexed 2018-09-06T14:41:51Z
_version_ 1610869544447901696