Biological-based semi-supervised clustering algorithm to improve gene function prediction

Analysis of simultaneous clustering of gene expression with biological knowledge has now become an importanttechnique and standard practice to present a proper interpretation of the data and its underlying biology. However, commonclustering algorithms do not provide a comprehensive approach that loo...

Full description

Bibliographic Details
Main Authors: Kasim, Shahreen, Deris, Safaai, M. Othman, Razib, Hashim, Rathiah
Format: Article
Published: Journal of Computing 2011
Subjects:
Online Access:http://eprints.utm.my/6964/
_version_ 1848891380689010688
author Kasim, Shahreen
Deris, Safaai
M. Othman, Razib
Hashim, Rathiah
author_facet Kasim, Shahreen
Deris, Safaai
M. Othman, Razib
Hashim, Rathiah
author_sort Kasim, Shahreen
building UTeM Institutional Repository
collection Online Access
description Analysis of simultaneous clustering of gene expression with biological knowledge has now become an importanttechnique and standard practice to present a proper interpretation of the data and its underlying biology. However, commonclustering algorithms do not provide a comprehensive approach that look into the three categories of annotations; biologicalprocess, molecular function, and cellular component, and were not tested with different functional annotation database formats.Furthermore, the traditional clustering algorithms use random initialization which causes inconsistent cluster generation and areunable to determine the number of clusters involved. In this paper, we present a novel computational framework called CluFA(Clustering Functional Annotation) for semi-supervised clustering of gene expression data. The framework consists of threestages: (i) preparation of Gene Ontology (GO) datasets, functional annotation databases, and testing datasets, (ii) a fuzzy c -means clustering to find the optimal clusters; and (iii) analysis of computational evaluation and biological validation from theresults obtained. With combination of the three GO term categories (biological process, molecular function, and cellularcomponent) and functional annotation databases (Saccharomyces Genome Database (SGD), the Yeast Database at MunichInformation Centre for Protein Sequences (MIPS), and Entrez), the CluFA is able to determine the number of clusters andreduce random initialization. In addition, CluFA is more comprehensive in its capability to predict the functions of unknowngenes. We tested our new computational framework for semi-supervised clustering of yeast gene expression data based onmultiple functional annotation databases. Experimental results show that 76 clusters have been identified via GO slim dataset.By applying SGD, Entrez, and MIPS functional annotation database to reduce random initialization, performance on bothcomputational evaluation and biological validation were improved. By the usage of comprehensive GO term categories, thelowest compactness and separation values were achieved. Therefore, from this experiment, we can conclude that CluFA hadimproved the gene function prediction through the utilization of GO and gene expression values using the fuzzy c -meansclustering algorithm by cross referencing it with the latest SGD annotation.
first_indexed 2025-11-15T20:57:03Z
format Article
id utm-6964
institution Universiti Teknologi Malaysia
institution_category Local University
last_indexed 2025-11-15T20:57:03Z
publishDate 2011
publisher Journal of Computing
recordtype eprints
repository_type Digital Repository
spelling utm-69642017-02-15T00:30:26Z http://eprints.utm.my/6964/ Biological-based semi-supervised clustering algorithm to improve gene function prediction Kasim, Shahreen Deris, Safaai M. Othman, Razib Hashim, Rathiah QA75 Electronic computers. Computer science Analysis of simultaneous clustering of gene expression with biological knowledge has now become an importanttechnique and standard practice to present a proper interpretation of the data and its underlying biology. However, commonclustering algorithms do not provide a comprehensive approach that look into the three categories of annotations; biologicalprocess, molecular function, and cellular component, and were not tested with different functional annotation database formats.Furthermore, the traditional clustering algorithms use random initialization which causes inconsistent cluster generation and areunable to determine the number of clusters involved. In this paper, we present a novel computational framework called CluFA(Clustering Functional Annotation) for semi-supervised clustering of gene expression data. The framework consists of threestages: (i) preparation of Gene Ontology (GO) datasets, functional annotation databases, and testing datasets, (ii) a fuzzy c -means clustering to find the optimal clusters; and (iii) analysis of computational evaluation and biological validation from theresults obtained. With combination of the three GO term categories (biological process, molecular function, and cellularcomponent) and functional annotation databases (Saccharomyces Genome Database (SGD), the Yeast Database at MunichInformation Centre for Protein Sequences (MIPS), and Entrez), the CluFA is able to determine the number of clusters andreduce random initialization. In addition, CluFA is more comprehensive in its capability to predict the functions of unknowngenes. We tested our new computational framework for semi-supervised clustering of yeast gene expression data based onmultiple functional annotation databases. Experimental results show that 76 clusters have been identified via GO slim dataset.By applying SGD, Entrez, and MIPS functional annotation database to reduce random initialization, performance on bothcomputational evaluation and biological validation were improved. By the usage of comprehensive GO term categories, thelowest compactness and separation values were achieved. Therefore, from this experiment, we can conclude that CluFA hadimproved the gene function prediction through the utilization of GO and gene expression values using the fuzzy c -meansclustering algorithm by cross referencing it with the latest SGD annotation. Journal of Computing 2011 Article PeerReviewed Kasim, Shahreen and Deris, Safaai and M. Othman, Razib and Hashim, Rathiah (2011) Biological-based semi-supervised clustering algorithm to improve gene function prediction. Journal of Computing, 3 (4). pp. 1-11. ISSN 2151-9617 http://www.scribd.com/doc/54846391/Biological-based-Semi-supervised-Clustering-Algorithm-to-Improve-Gene-Function-Prediction
spellingShingle QA75 Electronic computers. Computer science
Kasim, Shahreen
Deris, Safaai
M. Othman, Razib
Hashim, Rathiah
Biological-based semi-supervised clustering algorithm to improve gene function prediction
title Biological-based semi-supervised clustering algorithm to improve gene function prediction
title_full Biological-based semi-supervised clustering algorithm to improve gene function prediction
title_fullStr Biological-based semi-supervised clustering algorithm to improve gene function prediction
title_full_unstemmed Biological-based semi-supervised clustering algorithm to improve gene function prediction
title_short Biological-based semi-supervised clustering algorithm to improve gene function prediction
title_sort biological-based semi-supervised clustering algorithm to improve gene function prediction
topic QA75 Electronic computers. Computer science
url http://eprints.utm.my/6964/
http://eprints.utm.my/6964/