Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics

Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC) algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm....

Full description

Bibliographic Details
Main Authors:	Sirinukunwattana, Korsuk, Savage, Richard S., Bari, Muhammad F., Snead, David R. J., Rajpoot, Nasir M.
Format:	Online
Language:	English
Published:	Public Library of Science 2013
Online Access:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3806770/

id	pubmed-3806770
recordtype	oai_dc
spelling	pubmed-38067702013-11-05 Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics Sirinukunwattana, Korsuk Savage, Richard S. Bari, Muhammad F. Snead, David R. J. Rajpoot, Nasir M. Research Article Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC) algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC) algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data. Public Library of Science 2013-10-23 /pmc/articles/PMC3806770/ /pubmed/24194826 http://dx.doi.org/10.1371/journal.pone.0075748 Text en © 2013 Sirinukunwattana et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
repository_type	Open Access Journal
institution_category	Foreign Institution
institution	US National Center for Biotechnology Information
building	NCBI PubMed
collection	Online Access
language	English
format	Online
author	Sirinukunwattana, Korsuk Savage, Richard S. Bari, Muhammad F. Snead, David R. J. Rajpoot, Nasir M.
spellingShingle	Sirinukunwattana, Korsuk Savage, Richard S. Bari, Muhammad F. Snead, David R. J. Rajpoot, Nasir M. Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics
author_facet	Sirinukunwattana, Korsuk Savage, Richard S. Bari, Muhammad F. Snead, David R. J. Rajpoot, Nasir M.
author_sort	Sirinukunwattana, Korsuk
title	Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics
title_short	Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics
title_full	Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics
title_fullStr	Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics
title_full_unstemmed	Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics
title_sort	bayesian hierarchical clustering for studying cancer gene expression data with unknown statistics
description	Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC) algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC) algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data.
publisher	Public Library of Science
publishDate	2013
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3806770/
_version_	1612020399815524352

Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics

Similar Items