Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study
Naïve Bayes(NB), kNN and Adaboost are three commonly used text classifiers. Evaluation of these classifiers involves a variety of factors to be considered including benchmark used, feature selections, parameter settings of algorithms, and the measurement criteria employed. Researchers have demonstra...
| Main Authors: | , |
|---|---|
| Other Authors: | |
| Format: | Conference Paper |
| Published: |
Springer International Publishing
2014
|
| Subjects: | |
| Online Access: | http://hdl.handle.net/20.500.11937/26799 |
| _version_ | 1848752089109364736 |
|---|---|
| author | Zhu, Dengya Wong, K. |
| author2 | Chu Kiong Loo |
| author_facet | Chu Kiong Loo Zhu, Dengya Wong, K. |
| author_sort | Zhu, Dengya |
| building | Curtin Institutional Repository |
| collection | Online Access |
| description | Naïve Bayes(NB), kNN and Adaboost are three commonly used text classifiers. Evaluation of these classifiers involves a variety of factors to be considered including benchmark used, feature selections, parameter settings of algorithms, and the measurement criteria employed. Researchers have demonstrated that some algorithms outperform others on some corpus, however, labeling and corpus bias are two concerns in text categorization. This paper focuses on evaluating the three commonly used text classifiers by using an automatically generated text document set which is labelled by a group of experts to alleviate subjectiveness of labelling, and at the same time to examine how the performance of the algorithms is influenced by feature selection algorithms and the number of features selected. |
| first_indexed | 2025-11-14T08:03:04Z |
| format | Conference Paper |
| id | curtin-20.500.11937-26799 |
| institution | Curtin University Malaysia |
| institution_category | Local University |
| last_indexed | 2025-11-14T08:03:04Z |
| publishDate | 2014 |
| publisher | Springer International Publishing |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | curtin-20.500.11937-267992023-02-27T07:34:30Z Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study Zhu, Dengya Wong, K. Chu Kiong Loo Keem Siah Yap Kok Wai Wong Andrew Teoh Kaizhu Huang feature selection text classifiers Text categorization Naïve Bayes(NB), kNN and Adaboost are three commonly used text classifiers. Evaluation of these classifiers involves a variety of factors to be considered including benchmark used, feature selections, parameter settings of algorithms, and the measurement criteria employed. Researchers have demonstrated that some algorithms outperform others on some corpus, however, labeling and corpus bias are two concerns in text categorization. This paper focuses on evaluating the three commonly used text classifiers by using an automatically generated text document set which is labelled by a group of experts to alleviate subjectiveness of labelling, and at the same time to examine how the performance of the algorithms is influenced by feature selection algorithms and the number of features selected. 2014 Conference Paper http://hdl.handle.net/20.500.11937/26799 10.1007/978-3-319-12637-1_60 Springer International Publishing restricted |
| spellingShingle | feature selection text classifiers Text categorization Zhu, Dengya Wong, K. Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study |
| title | Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study |
| title_full | Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study |
| title_fullStr | Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study |
| title_full_unstemmed | Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study |
| title_short | Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study |
| title_sort | text categorization using an automatically generated labelled dataset: an evaluation study |
| topic | feature selection text classifiers Text categorization |
| url | http://hdl.handle.net/20.500.11937/26799 |