Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study

Naïve Bayes(NB), kNN and Adaboost are three commonly used text classifiers. Evaluation of these classifiers involves a variety of factors to be considered including benchmark used, feature selections, parameter settings of algorithms, and the measurement criteria employed. Researchers have demonstra...

Full description

Bibliographic Details
Main Authors:	Zhu, Dengya, Wong, K.
Other Authors:	Chu Kiong Loo
Format:	Conference Paper
Published:	Springer International Publishing 2014
Subjects:	feature selection text classifiers Text categorization
Online Access:	http://hdl.handle.net/20.500.11937/26799

_version_	1848752089109364736
author	Zhu, Dengya Wong, K.
author2	Chu Kiong Loo
author_facet	Chu Kiong Loo Zhu, Dengya Wong, K.
author_sort	Zhu, Dengya
building	Curtin Institutional Repository
collection	Online Access
description	Naïve Bayes(NB), kNN and Adaboost are three commonly used text classifiers. Evaluation of these classifiers involves a variety of factors to be considered including benchmark used, feature selections, parameter settings of algorithms, and the measurement criteria employed. Researchers have demonstrated that some algorithms outperform others on some corpus, however, labeling and corpus bias are two concerns in text categorization. This paper focuses on evaluating the three commonly used text classifiers by using an automatically generated text document set which is labelled by a group of experts to alleviate subjectiveness of labelling, and at the same time to examine how the performance of the algorithms is influenced by feature selection algorithms and the number of features selected.
first_indexed	2025-11-14T08:03:04Z
format	Conference Paper
id	curtin-20.500.11937-26799
institution	Curtin University Malaysia
institution_category	Local University
last_indexed	2025-11-14T08:03:04Z
publishDate	2014
publisher	Springer International Publishing
recordtype	eprints
repository_type	Digital Repository
spelling	curtin-20.500.11937-267992023-02-27T07:34:30Z Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study Zhu, Dengya Wong, K. Chu Kiong Loo Keem Siah Yap Kok Wai Wong Andrew Teoh Kaizhu Huang feature selection text classifiers Text categorization Naïve Bayes(NB), kNN and Adaboost are three commonly used text classifiers. Evaluation of these classifiers involves a variety of factors to be considered including benchmark used, feature selections, parameter settings of algorithms, and the measurement criteria employed. Researchers have demonstrated that some algorithms outperform others on some corpus, however, labeling and corpus bias are two concerns in text categorization. This paper focuses on evaluating the three commonly used text classifiers by using an automatically generated text document set which is labelled by a group of experts to alleviate subjectiveness of labelling, and at the same time to examine how the performance of the algorithms is influenced by feature selection algorithms and the number of features selected. 2014 Conference Paper http://hdl.handle.net/20.500.11937/26799 10.1007/978-3-319-12637-1_60 Springer International Publishing restricted
spellingShingle	feature selection text classifiers Text categorization Zhu, Dengya Wong, K. Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study
title	Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study
title_full	Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study
title_fullStr	Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study
title_full_unstemmed	Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study
title_short	Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study
title_sort	text categorization using an automatically generated labelled dataset: an evaluation study
topic	feature selection text classifiers Text categorization
url	http://hdl.handle.net/20.500.11937/26799

Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study

Similar Items