Text document pre-processing using the Bayes formula for classification based on the vector space model

This work utilizes the Bayes formula to vectorize a document according to a probability distribution based on keywords reflecting the probable categories that the document may belong to. The Bayes formula gives a range of probabilities to which the document can be assigned according to a pre determi...

Full description

Bibliographic Details
Main Authors: Isa, Dino, Hong, Lee Lam, Kallimani, V.P., Rajkumar, R.
Format: Article
Language:English
Published: Canadian Center of Science and Education 2008
Online Access:https://eprints.nottingham.ac.uk/2995/
_version_ 1848801174997696512
author Isa, Dino
Hong, Lee Lam
Kallimani, V.P.
Rajkumar, R.
author_facet Isa, Dino
Hong, Lee Lam
Kallimani, V.P.
Rajkumar, R.
author_sort Isa, Dino
building Nottingham Research Data Repository
collection Online Access
description This work utilizes the Bayes formula to vectorize a document according to a probability distribution based on keywords reflecting the probable categories that the document may belong to. The Bayes formula gives a range of probabilities to which the document can be assigned according to a pre determined set of topics (categories). Using this probability distribution as the vectors to represent the document, the text classification algorithms based on the vector space model, such as the Support Vector Machine (SVM) and Self-Organizing Map (SOM) can then be used to classify the documents on a multi-dimensional level, thus improving on the results obtained using only the highest probability to classify the document, such as that achieved by implementing the naïve Bayes classifier by itself. The effects of an inadvertent dimensionality reduction can be overcome using these algorithms. We compare the performance of these classifiers for high dimensional data.
first_indexed 2025-11-14T18:20:23Z
format Article
id nottingham-2995
institution University of Nottingham Malaysia Campus
institution_category Local University
language English
last_indexed 2025-11-14T21:03:16Z
publishDate 2008
publisher Canadian Center of Science and Education
recordtype eprints
repository_type Digital Repository
spelling nottingham-29952025-09-10T14:45:57Z https://eprints.nottingham.ac.uk/2995/ Text document pre-processing using the Bayes formula for classification based on the vector space model Isa, Dino Hong, Lee Lam Kallimani, V.P. Rajkumar, R. This work utilizes the Bayes formula to vectorize a document according to a probability distribution based on keywords reflecting the probable categories that the document may belong to. The Bayes formula gives a range of probabilities to which the document can be assigned according to a pre determined set of topics (categories). Using this probability distribution as the vectors to represent the document, the text classification algorithms based on the vector space model, such as the Support Vector Machine (SVM) and Self-Organizing Map (SOM) can then be used to classify the documents on a multi-dimensional level, thus improving on the results obtained using only the highest probability to classify the document, such as that achieved by implementing the naïve Bayes classifier by itself. The effects of an inadvertent dimensionality reduction can be overcome using these algorithms. We compare the performance of these classifiers for high dimensional data. Canadian Center of Science and Education 2008 Article PeerReviewed application/pdf en cc_by https://eprints.nottingham.ac.uk/2995/1/Isa_Text.pdf Isa, Dino, Hong, Lee Lam, Kallimani, V.P. and Rajkumar, R. (2008) Text document pre-processing using the Bayes formula for classification based on the vector space model. Computer and Information Science, 1 (4). pp. 79-90. ISSN 1913-8989 http://www.ccsenet.org/journal/index.php/cis/article/view/1058
spellingShingle Isa, Dino
Hong, Lee Lam
Kallimani, V.P.
Rajkumar, R.
Text document pre-processing using the Bayes formula for classification based on the vector space model
title Text document pre-processing using the Bayes formula for classification based on the vector space model
title_full Text document pre-processing using the Bayes formula for classification based on the vector space model
title_fullStr Text document pre-processing using the Bayes formula for classification based on the vector space model
title_full_unstemmed Text document pre-processing using the Bayes formula for classification based on the vector space model
title_short Text document pre-processing using the Bayes formula for classification based on the vector space model
title_sort text document pre-processing using the bayes formula for classification based on the vector space model
url https://eprints.nottingham.ac.uk/2995/
https://eprints.nottingham.ac.uk/2995/