Biological-based semi-supervised clustering algorithm to improve gene function prediction

Analysis of simultaneous clustering of gene expression with biological knowledge has now become an importanttechnique and standard practice to present a proper interpretation of the data and its underlying biology. However, commonclustering algorithms do not provide a comprehensive approach that loo...

Full description

Bibliographic Details
Main Authors:	Kasim, Shahreen, Deris, Safaai, M. Othman, Razib, Hashim, Rathiah
Format:	Article
Published:	Journal of Computing 2011
Subjects:	QA75 Electronic computers. Computer science
Online Access:	http://eprints.utm.my/6964/

_version_	1848891380689010688
author	Kasim, Shahreen Deris, Safaai M. Othman, Razib Hashim, Rathiah
author_facet	Kasim, Shahreen Deris, Safaai M. Othman, Razib Hashim, Rathiah
author_sort	Kasim, Shahreen
building	UTeM Institutional Repository
collection	Online Access
description	Analysis of simultaneous clustering of gene expression with biological knowledge has now become an importanttechnique and standard practice to present a proper interpretation of the data and its underlying biology. However, commonclustering algorithms do not provide a comprehensive approach that look into the three categories of annotations; biologicalprocess, molecular function, and cellular component, and were not tested with different functional annotation database formats.Furthermore, the traditional clustering algorithms use random initialization which causes inconsistent cluster generation and areunable to determine the number of clusters involved. In this paper, we present a novel computational framework called CluFA(Clustering Functional Annotation) for semi-supervised clustering of gene expression data. The framework consists of threestages: (i) preparation of Gene Ontology (GO) datasets, functional annotation databases, and testing datasets, (ii) a fuzzy c -means clustering to find the optimal clusters; and (iii) analysis of computational evaluation and biological validation from theresults obtained. With combination of the three GO term categories (biological process, molecular function, and cellularcomponent) and functional annotation databases (Saccharomyces Genome Database (SGD), the Yeast Database at MunichInformation Centre for Protein Sequences (MIPS), and Entrez), the CluFA is able to determine the number of clusters andreduce random initialization. In addition, CluFA is more comprehensive in its capability to predict the functions of unknowngenes. We tested our new computational framework for semi-supervised clustering of yeast gene expression data based onmultiple functional annotation databases. Experimental results show that 76 clusters have been identified via GO slim dataset.By applying SGD, Entrez, and MIPS functional annotation database to reduce random initialization, performance on bothcomputational evaluation and biological validation were improved. By the usage of comprehensive GO term categories, thelowest compactness and separation values were achieved. Therefore, from this experiment, we can conclude that CluFA hadimproved the gene function prediction through the utilization of GO and gene expression values using the fuzzy c -meansclustering algorithm by cross referencing it with the latest SGD annotation.
first_indexed	2025-11-15T20:57:03Z
format	Article
id	utm-6964
institution	Universiti Teknologi Malaysia
institution_category	Local University
last_indexed	2025-11-15T20:57:03Z
publishDate	2011
publisher	Journal of Computing
recordtype	eprints
repository_type	Digital Repository
spelling	utm-69642017-02-15T00:30:26Z http://eprints.utm.my/6964/ Biological-based semi-supervised clustering algorithm to improve gene function prediction Kasim, Shahreen Deris, Safaai M. Othman, Razib Hashim, Rathiah QA75 Electronic computers. Computer science Analysis of simultaneous clustering of gene expression with biological knowledge has now become an importanttechnique and standard practice to present a proper interpretation of the data and its underlying biology. However, commonclustering algorithms do not provide a comprehensive approach that look into the three categories of annotations; biologicalprocess, molecular function, and cellular component, and were not tested with different functional annotation database formats.Furthermore, the traditional clustering algorithms use random initialization which causes inconsistent cluster generation and areunable to determine the number of clusters involved. In this paper, we present a novel computational framework called CluFA(Clustering Functional Annotation) for semi-supervised clustering of gene expression data. The framework consists of threestages: (i) preparation of Gene Ontology (GO) datasets, functional annotation databases, and testing datasets, (ii) a fuzzy c -means clustering to find the optimal clusters; and (iii) analysis of computational evaluation and biological validation from theresults obtained. With combination of the three GO term categories (biological process, molecular function, and cellularcomponent) and functional annotation databases (Saccharomyces Genome Database (SGD), the Yeast Database at MunichInformation Centre for Protein Sequences (MIPS), and Entrez), the CluFA is able to determine the number of clusters andreduce random initialization. In addition, CluFA is more comprehensive in its capability to predict the functions of unknowngenes. We tested our new computational framework for semi-supervised clustering of yeast gene expression data based onmultiple functional annotation databases. Experimental results show that 76 clusters have been identified via GO slim dataset.By applying SGD, Entrez, and MIPS functional annotation database to reduce random initialization, performance on bothcomputational evaluation and biological validation were improved. By the usage of comprehensive GO term categories, thelowest compactness and separation values were achieved. Therefore, from this experiment, we can conclude that CluFA hadimproved the gene function prediction through the utilization of GO and gene expression values using the fuzzy c -meansclustering algorithm by cross referencing it with the latest SGD annotation. Journal of Computing 2011 Article PeerReviewed Kasim, Shahreen and Deris, Safaai and M. Othman, Razib and Hashim, Rathiah (2011) Biological-based semi-supervised clustering algorithm to improve gene function prediction. Journal of Computing, 3 (4). pp. 1-11. ISSN 2151-9617 http://www.scribd.com/doc/54846391/Biological-based-Semi-supervised-Clustering-Algorithm-to-Improve-Gene-Function-Prediction
spellingShingle	QA75 Electronic computers. Computer science Kasim, Shahreen Deris, Safaai M. Othman, Razib Hashim, Rathiah Biological-based semi-supervised clustering algorithm to improve gene function prediction
title	Biological-based semi-supervised clustering algorithm to improve gene function prediction
title_full	Biological-based semi-supervised clustering algorithm to improve gene function prediction
title_fullStr	Biological-based semi-supervised clustering algorithm to improve gene function prediction
title_full_unstemmed	Biological-based semi-supervised clustering algorithm to improve gene function prediction
title_short	Biological-based semi-supervised clustering algorithm to improve gene function prediction
title_sort	biological-based semi-supervised clustering algorithm to improve gene function prediction
topic	QA75 Electronic computers. Computer science
url	http://eprints.utm.my/6964/ http://eprints.utm.my/6964/

Biological-based semi-supervised clustering algorithm to improve gene function prediction

Similar Items