Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study

Naïve Bayes(NB), kNN and Adaboost are three commonly used text classifiers. Evaluation of these classifiers involves a variety of factors to be considered including benchmark used, feature selections, parameter settings of algorithms, and the measurement criteria employed. Researchers have demonstra...

Full description

Bibliographic Details
Main Authors:	Zhu, Dengya, Wong, K.
Other Authors:	Chu Kiong Loo
Format:	Conference Paper
Published:	Springer International Publishing 2014
Subjects:	feature selection text classifiers Text categorization
Online Access:	http://hdl.handle.net/20.500.11937/26799

Description
Summary:	Naïve Bayes(NB), kNN and Adaboost are three commonly used text classifiers. Evaluation of these classifiers involves a variety of factors to be considered including benchmark used, feature selections, parameter settings of algorithms, and the measurement criteria employed. Researchers have demonstrated that some algorithms outperform others on some corpus, however, labeling and corpus bias are two concerns in text categorization. This paper focuses on evaluating the three commonly used text classifiers by using an automatically generated text document set which is labelled by a group of experts to alleviate subjectiveness of labelling, and at the same time to examine how the performance of the algorithms is influenced by feature selection algorithms and the number of features selected.

Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study

Similar Items