An evaluation study on text categorization using automatically generated labeled dataset

Naïve Bayes, k-nearest neighbors, Adaboost, support vector machines and neural networks are five among others commonly used text classifiers. Evaluation of these classifiers involves a variety of factors to be considered including benchmark used, feature selections, parameter settings of algorithms,...

Full description

Bibliographic Details
Main Authors:	Zhu, Dengya, Wong, K.
Format:	Journal Article
Published:	Elsevier BV 2017
Online Access:	http://hdl.handle.net/20.500.11937/53578

_version_	1848759177164357632
author	Zhu, Dengya Wong, K.
author_facet	Zhu, Dengya Wong, K.
author_sort	Zhu, Dengya
building	Curtin Institutional Repository
collection	Online Access
description	Naïve Bayes, k-nearest neighbors, Adaboost, support vector machines and neural networks are five among others commonly used text classifiers. Evaluation of these classifiers involves a variety of factors to be considered including benchmark used, feature selections, parameter settings of algorithms, and the measurement criteria employed. Researchers have demonstrated that some algorithms outperform others on some corpus, however, inconsistency of human labeling and high dimensionality of feature spaces are two issues to be addressed in text categorization. This paper focuses on evaluating the five commonly used text classifiers by using an automatically generated text document collection which is labeled by a group of experts to alleviate subjectivity of human category assignments, and at the same time to examine the influence of the number of features on the performance of the algorithms.
first_indexed	2025-11-14T09:55:44Z
format	Journal Article
id	curtin-20.500.11937-53578
institution	Curtin University Malaysia
institution_category	Local University
last_indexed	2025-11-14T09:55:44Z
publishDate	2017
publisher	Elsevier BV
recordtype	eprints
repository_type	Digital Repository
spelling	curtin-20.500.11937-535782018-03-29T09:08:38Z An evaluation study on text categorization using automatically generated labeled dataset Zhu, Dengya Wong, K. Naïve Bayes, k-nearest neighbors, Adaboost, support vector machines and neural networks are five among others commonly used text classifiers. Evaluation of these classifiers involves a variety of factors to be considered including benchmark used, feature selections, parameter settings of algorithms, and the measurement criteria employed. Researchers have demonstrated that some algorithms outperform others on some corpus, however, inconsistency of human labeling and high dimensionality of feature spaces are two issues to be addressed in text categorization. This paper focuses on evaluating the five commonly used text classifiers by using an automatically generated text document collection which is labeled by a group of experts to alleviate subjectivity of human category assignments, and at the same time to examine the influence of the number of features on the performance of the algorithms. 2017 Journal Article http://hdl.handle.net/20.500.11937/53578 10.1016/j.neucom.2016.04.072 Elsevier BV restricted
spellingShingle	Zhu, Dengya Wong, K. An evaluation study on text categorization using automatically generated labeled dataset
title	An evaluation study on text categorization using automatically generated labeled dataset
title_full	An evaluation study on text categorization using automatically generated labeled dataset
title_fullStr	An evaluation study on text categorization using automatically generated labeled dataset
title_full_unstemmed	An evaluation study on text categorization using automatically generated labeled dataset
title_short	An evaluation study on text categorization using automatically generated labeled dataset
title_sort	evaluation study on text categorization using automatically generated labeled dataset
url	http://hdl.handle.net/20.500.11937/53578

An evaluation study on text categorization using automatically generated labeled dataset

Similar Items