An evaluation study on text categorization using automatically generated labeled dataset
Naïve Bayes, k-nearest neighbors, Adaboost, support vector machines and neural networks are five among others commonly used text classifiers. Evaluation of these classifiers involves a variety of factors to be considered including benchmark used, feature selections, parameter settings of algorithms,...
| Main Authors: | , |
|---|---|
| Format: | Journal Article |
| Published: |
Elsevier BV
2017
|
| Online Access: | http://hdl.handle.net/20.500.11937/53578 |
| _version_ | 1848759177164357632 |
|---|---|
| author | Zhu, Dengya Wong, K. |
| author_facet | Zhu, Dengya Wong, K. |
| author_sort | Zhu, Dengya |
| building | Curtin Institutional Repository |
| collection | Online Access |
| description | Naïve Bayes, k-nearest neighbors, Adaboost, support vector machines and neural networks are five among others commonly used text classifiers. Evaluation of these classifiers involves a variety of factors to be considered including benchmark used, feature selections, parameter settings of algorithms, and the measurement criteria employed. Researchers have demonstrated that some algorithms outperform others on some corpus, however, inconsistency of human labeling and high dimensionality of feature spaces are two issues to be addressed in text categorization. This paper focuses on evaluating the five commonly used text classifiers by using an automatically generated text document collection which is labeled by a group of experts to alleviate subjectivity of human category assignments, and at the same time to examine the influence of the number of features on the performance of the algorithms. |
| first_indexed | 2025-11-14T09:55:44Z |
| format | Journal Article |
| id | curtin-20.500.11937-53578 |
| institution | Curtin University Malaysia |
| institution_category | Local University |
| last_indexed | 2025-11-14T09:55:44Z |
| publishDate | 2017 |
| publisher | Elsevier BV |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | curtin-20.500.11937-535782018-03-29T09:08:38Z An evaluation study on text categorization using automatically generated labeled dataset Zhu, Dengya Wong, K. Naïve Bayes, k-nearest neighbors, Adaboost, support vector machines and neural networks are five among others commonly used text classifiers. Evaluation of these classifiers involves a variety of factors to be considered including benchmark used, feature selections, parameter settings of algorithms, and the measurement criteria employed. Researchers have demonstrated that some algorithms outperform others on some corpus, however, inconsistency of human labeling and high dimensionality of feature spaces are two issues to be addressed in text categorization. This paper focuses on evaluating the five commonly used text classifiers by using an automatically generated text document collection which is labeled by a group of experts to alleviate subjectivity of human category assignments, and at the same time to examine the influence of the number of features on the performance of the algorithms. 2017 Journal Article http://hdl.handle.net/20.500.11937/53578 10.1016/j.neucom.2016.04.072 Elsevier BV restricted |
| spellingShingle | Zhu, Dengya Wong, K. An evaluation study on text categorization using automatically generated labeled dataset |
| title | An evaluation study on text categorization using automatically generated labeled dataset |
| title_full | An evaluation study on text categorization using automatically generated labeled dataset |
| title_fullStr | An evaluation study on text categorization using automatically generated labeled dataset |
| title_full_unstemmed | An evaluation study on text categorization using automatically generated labeled dataset |
| title_short | An evaluation study on text categorization using automatically generated labeled dataset |
| title_sort | evaluation study on text categorization using automatically generated labeled dataset |
| url | http://hdl.handle.net/20.500.11937/53578 |