An empirical study of feature selection for text categorization based on term weightage
This paper proposes a local feature selection (FS) measure namely, Categorical Descriptor Term (CTD) for text categorization. It is derived based on classic term weighting scheme, TFIDF. The method explicitly chooses feature set for each category by only selecting set of terms from relevant category...
Main Authors: | , |
---|---|
Format: | Conference or Workshop Item |
Language: | English |
Published: |
Universiti Malaysia Sarawak, (UNIMAS)
2004
|
Subjects: | |
Online Access: | http://ir.unimas.my/1190/ http://ir.unimas.my/1190/1/An%2Bemperical%2Bstudy%2Bof%2Bfeature%2Bselection%2Bfor%2BTEXT%2Bcategorization%2Bbased%2Bon%2Bterm%2Bweightage%2B%2528%2Babstract%2529.pdf |
id |
unimas-1190 |
---|---|
recordtype |
eprints |
spelling |
unimas-11902015-03-24T01:04:36Z http://ir.unimas.my/1190/ An empirical study of feature selection for text categorization based on term weightage Bong, Chih How Kulathuramaiyer, Narayanan AC Collections. Series. Collected works T Technology (General) This paper proposes a local feature selection (FS) measure namely, Categorical Descriptor Term (CTD) for text categorization. It is derived based on classic term weighting scheme, TFIDF. The method explicitly chooses feature set for each category by only selecting set of terms from relevant category. Although past literatures have suggested that the use of features from irrelevant categories can improve the measure of text categorization, we believe that by incorporating only relevant feature can be highly effective. The experimental comparison is carried out between CTD and five wellknown feature selection measures: Information Gain, Chi-Square, Correlation Coefficient, Odd Ratio and GSS Coefficient. The results also show that our proposed method can perform comparatively well with other FS measures, especially on collection with highly overlapped topics. Universiti Malaysia Sarawak, (UNIMAS) 2004 Conference or Workshop Item NonPeerReviewed text en http://ir.unimas.my/1190/1/An%2Bemperical%2Bstudy%2Bof%2Bfeature%2Bselection%2Bfor%2BTEXT%2Bcategorization%2Bbased%2Bon%2Bterm%2Bweightage%2B%2528%2Babstract%2529.pdf Bong, Chih How and Kulathuramaiyer, Narayanan (2004) An empirical study of feature selection for text categorization based on term weightage. In: 2004 IEEE/WIC/ACM International Conference on Web Intelligence. |
repository_type |
Digital Repository |
institution_category |
Local University |
institution |
Universiti Malaysia Sarawak |
building |
UNIMAS Institutional Repository |
collection |
Online Access |
language |
English |
topic |
AC Collections. Series. Collected works T Technology (General) |
spellingShingle |
AC Collections. Series. Collected works T Technology (General) Bong, Chih How Kulathuramaiyer, Narayanan An empirical study of feature selection for text categorization based on term weightage |
description |
This paper proposes a local feature selection (FS) measure namely, Categorical Descriptor Term (CTD) for text categorization. It is derived based on classic term weighting scheme, TFIDF. The method explicitly chooses feature set for each category by only selecting set of terms from relevant category. Although past literatures have suggested that the use of features from irrelevant categories can improve the measure of text categorization, we believe that by incorporating only relevant feature can be highly effective. The experimental comparison is carried out between CTD and five wellknown feature selection measures: Information Gain, Chi-Square, Correlation Coefficient, Odd Ratio and GSS Coefficient. The results also show that our proposed method can perform comparatively well with other FS measures, especially on collection with highly overlapped topics. |
format |
Conference or Workshop Item |
author |
Bong, Chih How Kulathuramaiyer, Narayanan |
author_facet |
Bong, Chih How Kulathuramaiyer, Narayanan |
author_sort |
Bong, Chih How |
title |
An empirical study of feature selection for text categorization based on term weightage |
title_short |
An empirical study of feature selection for text categorization based on term weightage |
title_full |
An empirical study of feature selection for text categorization based on term weightage |
title_fullStr |
An empirical study of feature selection for text categorization based on term weightage |
title_full_unstemmed |
An empirical study of feature selection for text categorization based on term weightage |
title_sort |
empirical study of feature selection for text categorization based on term weightage |
publisher |
Universiti Malaysia Sarawak, (UNIMAS) |
publishDate |
2004 |
url |
http://ir.unimas.my/1190/ http://ir.unimas.my/1190/1/An%2Bemperical%2Bstudy%2Bof%2Bfeature%2Bselection%2Bfor%2BTEXT%2Bcategorization%2Bbased%2Bon%2Bterm%2Bweightage%2B%2528%2Babstract%2529.pdf |
first_indexed |
2018-09-06T14:41:51Z |
last_indexed |
2018-09-06T14:41:51Z |
_version_ |
1610869544447901696 |