A "non-parametric" version of the naive Bayes classifier

Many algorithms have been proposed for the machine learning task of classication. One of the simplest methods, the naive Bayes classifyer, has often been found to give good performance despite the fact that its underlying assumptions (of independence and a Normal distribution of the variables) are p...

Full description

Bibliographic Details
Main Authors:	Soria, Daniele, Garibaldi, Jonathan M., Ambrogi, Federico, Biganzoli, Elia M., Ellis, Ian O.
Format:	Article
Published:	Elsevier 2011
Online Access:	https://eprints.nottingham.ac.uk/28135/

_version_	1848793515176230912
author	Soria, Daniele Garibaldi, Jonathan M. Ambrogi, Federico Biganzoli, Elia M. Ellis, Ian O.
author_facet	Soria, Daniele Garibaldi, Jonathan M. Ambrogi, Federico Biganzoli, Elia M. Ellis, Ian O.
author_sort	Soria, Daniele
building	Nottingham Research Data Repository
collection	Online Access
description	Many algorithms have been proposed for the machine learning task of classication. One of the simplest methods, the naive Bayes classifyer, has often been found to give good performance despite the fact that its underlying assumptions (of independence and a Normal distribution of the variables) are perhaps violated. In previous work, we applied naive Bayes and other standard algorithms to a breast cancer database from Nottingham City Hospital in which the variables are highly non-Normal and found that the algorithm performed well when predicting a class that had been derived from the same data. However, when we then applied naive Bayes to predict an alternative clinical variable, it performed much worse than other techniques. This motivated us to propose an alternative method, based on naive Bayes, which removes the requirement for the variables to be Normally distributed, but retains the essential structure and other underlying assumptions of the method. We tested our novel algorithm on our breast cancer data and on three UCI datasets which also exhibited strong violations of Normality. We found our algorithm outperformed naive Bayes in all four cases and outperformed multinomial logistic regression (MLR) in two cases. We conclude that our method offers a competitive alternative to MLR and naive Bayes when dealing with data sets in which non-Normal distributions are observed.
first_indexed	2025-11-14T19:01:31Z
format	Article
id	nottingham-28135
institution	University of Nottingham Malaysia Campus
institution_category	Local University
last_indexed	2025-11-14T19:01:31Z
publishDate	2011
publisher	Elsevier
recordtype	eprints
repository_type	Digital Repository
spelling	nottingham-281352020-05-04T20:23:12Z https://eprints.nottingham.ac.uk/28135/ A "non-parametric" version of the naive Bayes classifier Soria, Daniele Garibaldi, Jonathan M. Ambrogi, Federico Biganzoli, Elia M. Ellis, Ian O. Many algorithms have been proposed for the machine learning task of classication. One of the simplest methods, the naive Bayes classifyer, has often been found to give good performance despite the fact that its underlying assumptions (of independence and a Normal distribution of the variables) are perhaps violated. In previous work, we applied naive Bayes and other standard algorithms to a breast cancer database from Nottingham City Hospital in which the variables are highly non-Normal and found that the algorithm performed well when predicting a class that had been derived from the same data. However, when we then applied naive Bayes to predict an alternative clinical variable, it performed much worse than other techniques. This motivated us to propose an alternative method, based on naive Bayes, which removes the requirement for the variables to be Normally distributed, but retains the essential structure and other underlying assumptions of the method. We tested our novel algorithm on our breast cancer data and on three UCI datasets which also exhibited strong violations of Normality. We found our algorithm outperformed naive Bayes in all four cases and outperformed multinomial logistic regression (MLR) in two cases. We conclude that our method offers a competitive alternative to MLR and naive Bayes when dealing with data sets in which non-Normal distributions are observed. Elsevier 2011-08 Article PeerReviewed Soria, Daniele, Garibaldi, Jonathan M., Ambrogi, Federico, Biganzoli, Elia M. and Ellis, Ian O. (2011) A "non-parametric" version of the naive Bayes classifier. Knowledge-Based Systems, 24 (6). pp. 775-784. ISSN 0950-7051 http://www.sciencedirect.com/science/article/pii/S0950705111000414 doi:10.1016/j.knosys.2011.02.014 doi:10.1016/j.knosys.2011.02.014
spellingShingle	Soria, Daniele Garibaldi, Jonathan M. Ambrogi, Federico Biganzoli, Elia M. Ellis, Ian O. A "non-parametric" version of the naive Bayes classifier
title	A "non-parametric" version of the naive Bayes classifier
title_full	A "non-parametric" version of the naive Bayes classifier
title_fullStr	A "non-parametric" version of the naive Bayes classifier
title_full_unstemmed	A "non-parametric" version of the naive Bayes classifier
title_short	A "non-parametric" version of the naive Bayes classifier
title_sort	"non-parametric" version of the naive bayes classifier
url	https://eprints.nottingham.ac.uk/28135/ https://eprints.nottingham.ac.uk/28135/ https://eprints.nottingham.ac.uk/28135/

A "non-parametric" version of the naive Bayes classifier

Similar Items