Feature Selection Based on Semantics

The need for an automated text categorization system is spurred on by the extensive increase of digital documents. This paper looks into feature selection, one of the main processes in text categorization. The feature selection approach is based on semantics by employing WordNet [1]. The proposed Wo...

Full description

Bibliographic Details
Main Authors: Chua, Stephanie, Kulathuramaiyer, Narayanan
Format: Article
Language:English
Published: Springer Netherlands 2008
Subjects:
Online Access:http://ir.unimas.my/id/eprint/531/
http://ir.unimas.my/id/eprint/531/1/feature_selection_based_on_semantics.pdf
_version_ 1848834561891368960
author Chua, Stephanie
Kulathuramaiyer, Narayanan
author_facet Chua, Stephanie
Kulathuramaiyer, Narayanan
author_sort Chua, Stephanie
building UNIMAS Institutional Repository
collection Online Access
description The need for an automated text categorization system is spurred on by the extensive increase of digital documents. This paper looks into feature selection, one of the main processes in text categorization. The feature selection approach is based on semantics by employing WordNet [1]. The proposed WordNet-based feature selection approach makes use of synonymous nouns and dominant senses in selecting terms that are reflective of a category’s content. Experiments are carried out using the top ten most populated categories of the Reuters-21578 dataset. Results have shown that statistical feature selection approaches, Chi-Square and Information Gain, are able to produce better results when used with the WordNet-based feature selection approach. The use of the WordNet-based feature selection approach with statistical weighting results in a set of terms that is more meaningful compared to the terms chosen by the statistical approaches. In addition, there is also an effective dimensionality reduction of the feature space when the WordNet-based feature selection method is used.
first_indexed 2025-11-15T05:53:56Z
format Article
id unimas-531
institution Universiti Malaysia Sarawak
institution_category Local University
language English
last_indexed 2025-11-15T05:53:56Z
publishDate 2008
publisher Springer Netherlands
recordtype eprints
repository_type Digital Repository
spelling unimas-5312015-03-23T08:11:08Z http://ir.unimas.my/id/eprint/531/ Feature Selection Based on Semantics Chua, Stephanie Kulathuramaiyer, Narayanan LB Theory and practice of education T Technology (General) The need for an automated text categorization system is spurred on by the extensive increase of digital documents. This paper looks into feature selection, one of the main processes in text categorization. The feature selection approach is based on semantics by employing WordNet [1]. The proposed WordNet-based feature selection approach makes use of synonymous nouns and dominant senses in selecting terms that are reflective of a category’s content. Experiments are carried out using the top ten most populated categories of the Reuters-21578 dataset. Results have shown that statistical feature selection approaches, Chi-Square and Information Gain, are able to produce better results when used with the WordNet-based feature selection approach. The use of the WordNet-based feature selection approach with statistical weighting results in a set of terms that is more meaningful compared to the terms chosen by the statistical approaches. In addition, there is also an effective dimensionality reduction of the feature space when the WordNet-based feature selection method is used. Springer Netherlands 2008 Article NonPeerReviewed text en http://ir.unimas.my/id/eprint/531/1/feature_selection_based_on_semantics.pdf Chua, Stephanie and Kulathuramaiyer, Narayanan (2008) Feature Selection Based on Semantics. Innovations and Advanced Techniques in Systems, Computing Sciences and Software Engineering. pp. 471-476. http://ir.unimas.my/531/1/feature_selection_based_on_semantics.pdf
spellingShingle LB Theory and practice of education
T Technology (General)
Chua, Stephanie
Kulathuramaiyer, Narayanan
Feature Selection Based on Semantics
title Feature Selection Based on Semantics
title_full Feature Selection Based on Semantics
title_fullStr Feature Selection Based on Semantics
title_full_unstemmed Feature Selection Based on Semantics
title_short Feature Selection Based on Semantics
title_sort feature selection based on semantics
topic LB Theory and practice of education
T Technology (General)
url http://ir.unimas.my/id/eprint/531/
http://ir.unimas.my/id/eprint/531/
http://ir.unimas.my/id/eprint/531/1/feature_selection_based_on_semantics.pdf