Biological-based semi-supervised clustering algorithm to improve gene function prediction
Analysis of simultaneous clustering of gene expression with biological knowledge has now become an importanttechnique and standard practice to present a proper interpretation of the data and its underlying biology. However, commonclustering algorithms do not provide a comprehensive approach that loo...
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Published: |
Journal of Computing
2011
|
| Subjects: | |
| Online Access: | http://eprints.utm.my/6964/ |
| _version_ | 1848891380689010688 |
|---|---|
| author | Kasim, Shahreen Deris, Safaai M. Othman, Razib Hashim, Rathiah |
| author_facet | Kasim, Shahreen Deris, Safaai M. Othman, Razib Hashim, Rathiah |
| author_sort | Kasim, Shahreen |
| building | UTeM Institutional Repository |
| collection | Online Access |
| description | Analysis of simultaneous clustering of gene expression with biological knowledge has now become an importanttechnique and standard practice to present a proper interpretation of the data and its underlying biology. However, commonclustering algorithms do not provide a comprehensive approach that look into the three categories of annotations; biologicalprocess, molecular function, and cellular component, and were not tested with different functional annotation database formats.Furthermore, the traditional clustering algorithms use random initialization which causes inconsistent cluster generation and areunable to determine the number of clusters involved. In this paper, we present a novel computational framework called CluFA(Clustering Functional Annotation) for semi-supervised clustering of gene expression data. The framework consists of threestages: (i) preparation of Gene Ontology (GO) datasets, functional annotation databases, and testing datasets, (ii) a fuzzy c -means clustering to find the optimal clusters; and (iii) analysis of computational evaluation and biological validation from theresults obtained. With combination of the three GO term categories (biological process, molecular function, and cellularcomponent) and functional annotation databases (Saccharomyces Genome Database (SGD), the Yeast Database at MunichInformation Centre for Protein Sequences (MIPS), and Entrez), the CluFA is able to determine the number of clusters andreduce random initialization. In addition, CluFA is more comprehensive in its capability to predict the functions of unknowngenes. We tested our new computational framework for semi-supervised clustering of yeast gene expression data based onmultiple functional annotation databases. Experimental results show that 76 clusters have been identified via GO slim dataset.By applying SGD, Entrez, and MIPS functional annotation database to reduce random initialization, performance on bothcomputational evaluation and biological validation were improved. By the usage of comprehensive GO term categories, thelowest compactness and separation values were achieved. Therefore, from this experiment, we can conclude that CluFA hadimproved the gene function prediction through the utilization of GO and gene expression values using the fuzzy c -meansclustering algorithm by cross referencing it with the latest SGD annotation. |
| first_indexed | 2025-11-15T20:57:03Z |
| format | Article |
| id | utm-6964 |
| institution | Universiti Teknologi Malaysia |
| institution_category | Local University |
| last_indexed | 2025-11-15T20:57:03Z |
| publishDate | 2011 |
| publisher | Journal of Computing |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | utm-69642017-02-15T00:30:26Z http://eprints.utm.my/6964/ Biological-based semi-supervised clustering algorithm to improve gene function prediction Kasim, Shahreen Deris, Safaai M. Othman, Razib Hashim, Rathiah QA75 Electronic computers. Computer science Analysis of simultaneous clustering of gene expression with biological knowledge has now become an importanttechnique and standard practice to present a proper interpretation of the data and its underlying biology. However, commonclustering algorithms do not provide a comprehensive approach that look into the three categories of annotations; biologicalprocess, molecular function, and cellular component, and were not tested with different functional annotation database formats.Furthermore, the traditional clustering algorithms use random initialization which causes inconsistent cluster generation and areunable to determine the number of clusters involved. In this paper, we present a novel computational framework called CluFA(Clustering Functional Annotation) for semi-supervised clustering of gene expression data. The framework consists of threestages: (i) preparation of Gene Ontology (GO) datasets, functional annotation databases, and testing datasets, (ii) a fuzzy c -means clustering to find the optimal clusters; and (iii) analysis of computational evaluation and biological validation from theresults obtained. With combination of the three GO term categories (biological process, molecular function, and cellularcomponent) and functional annotation databases (Saccharomyces Genome Database (SGD), the Yeast Database at MunichInformation Centre for Protein Sequences (MIPS), and Entrez), the CluFA is able to determine the number of clusters andreduce random initialization. In addition, CluFA is more comprehensive in its capability to predict the functions of unknowngenes. We tested our new computational framework for semi-supervised clustering of yeast gene expression data based onmultiple functional annotation databases. Experimental results show that 76 clusters have been identified via GO slim dataset.By applying SGD, Entrez, and MIPS functional annotation database to reduce random initialization, performance on bothcomputational evaluation and biological validation were improved. By the usage of comprehensive GO term categories, thelowest compactness and separation values were achieved. Therefore, from this experiment, we can conclude that CluFA hadimproved the gene function prediction through the utilization of GO and gene expression values using the fuzzy c -meansclustering algorithm by cross referencing it with the latest SGD annotation. Journal of Computing 2011 Article PeerReviewed Kasim, Shahreen and Deris, Safaai and M. Othman, Razib and Hashim, Rathiah (2011) Biological-based semi-supervised clustering algorithm to improve gene function prediction. Journal of Computing, 3 (4). pp. 1-11. ISSN 2151-9617 http://www.scribd.com/doc/54846391/Biological-based-Semi-supervised-Clustering-Algorithm-to-Improve-Gene-Function-Prediction |
| spellingShingle | QA75 Electronic computers. Computer science Kasim, Shahreen Deris, Safaai M. Othman, Razib Hashim, Rathiah Biological-based semi-supervised clustering algorithm to improve gene function prediction |
| title | Biological-based semi-supervised clustering algorithm to improve gene function prediction
|
| title_full | Biological-based semi-supervised clustering algorithm to improve gene function prediction
|
| title_fullStr | Biological-based semi-supervised clustering algorithm to improve gene function prediction
|
| title_full_unstemmed | Biological-based semi-supervised clustering algorithm to improve gene function prediction
|
| title_short | Biological-based semi-supervised clustering algorithm to improve gene function prediction
|
| title_sort | biological-based semi-supervised clustering algorithm to improve gene function prediction |
| topic | QA75 Electronic computers. Computer science |
| url | http://eprints.utm.my/6964/ http://eprints.utm.my/6964/ |