An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome
Given the ever increasing quantity of sequence data, functional annotation of new gene sequences persists as being a significant challenge for bioinformatics. This is a particular problem for transcriptomics studies in crop plants where large genomes and evolutionarily distant model organisms, means...
| Main Author: | |
|---|---|
| Format: | Thesis (University of Nottingham only) |
| Language: | English |
| Published: |
2012
|
| Subjects: | |
| Online Access: | https://eprints.nottingham.ac.uk/12580/ |
| _version_ | 1848791530760830976 |
|---|---|
| author | Hindle, Matthew Morritt |
| author_facet | Hindle, Matthew Morritt |
| author_sort | Hindle, Matthew Morritt |
| building | Nottingham Research Data Repository |
| collection | Online Access |
| description | Given the ever increasing quantity of sequence data, functional annotation of new gene sequences persists as being a significant challenge for bioinformatics. This is a particular problem for transcriptomics studies in crop plants where large genomes and evolutionarily distant model organisms, means that identifying the function of a given gene used on a microarray, is often a non-trivial task. Information pertinent to gene annotations is spread across technically and semantically heterogeneous biological databases. Combining and exploiting these data in a consistent way has the potential to improve our ability to assign functions to new or uncharacterised genes.
Methods: The Ondex data integration framework was further developed to integrate databases pertinent to plant gene annotation, and provide data inference tools. The CoPSA annotation pipeline was created to provide automated annotation of novel plant genes using this knowledgebase. CoPSA was used to derive annotations for Affymetrix GeneChips available for plant species. A conjoint approach was used to align GeneChip sequences to orthologous proteins, and identify protein domain regions. These proteins and domains were used together with multiple evidences to predict functional annotations for sequences on the GeneChip. Quality was assessed with reference to other annotation pipelines. These improved gene annotations were used in the analysis of a time-series transcriptomics study of the differential responses of durum wheat varieties to water stress.
Results and Conclusions: The integration of plant databases using the Ondex showed that it was possible to increase the overall quantity and quality of information available, and thereby improve the resulting annotation. Direct data aggregation benefits were observed, as well as new information derived from inference across databases. The CoPSA pipeline was shown to improve coverage of the wheat microarray compared to the NetAffx and BLAST2GO pipelines. Leverage of these annotations during the analysis of data from a transcriptomics study of the durum wheat water stress responses, yielded new biological insights into water stress and highlighted potential candidate genes that could be used by breeders to improve drought response. |
| first_indexed | 2025-11-14T18:29:59Z |
| format | Thesis (University of Nottingham only) |
| id | nottingham-12580 |
| institution | University of Nottingham Malaysia Campus |
| institution_category | Local University |
| language | English |
| last_indexed | 2025-11-14T18:29:59Z |
| publishDate | 2012 |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | nottingham-125802025-02-28T11:20:05Z https://eprints.nottingham.ac.uk/12580/ An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome Hindle, Matthew Morritt Given the ever increasing quantity of sequence data, functional annotation of new gene sequences persists as being a significant challenge for bioinformatics. This is a particular problem for transcriptomics studies in crop plants where large genomes and evolutionarily distant model organisms, means that identifying the function of a given gene used on a microarray, is often a non-trivial task. Information pertinent to gene annotations is spread across technically and semantically heterogeneous biological databases. Combining and exploiting these data in a consistent way has the potential to improve our ability to assign functions to new or uncharacterised genes. Methods: The Ondex data integration framework was further developed to integrate databases pertinent to plant gene annotation, and provide data inference tools. The CoPSA annotation pipeline was created to provide automated annotation of novel plant genes using this knowledgebase. CoPSA was used to derive annotations for Affymetrix GeneChips available for plant species. A conjoint approach was used to align GeneChip sequences to orthologous proteins, and identify protein domain regions. These proteins and domains were used together with multiple evidences to predict functional annotations for sequences on the GeneChip. Quality was assessed with reference to other annotation pipelines. These improved gene annotations were used in the analysis of a time-series transcriptomics study of the differential responses of durum wheat varieties to water stress. Results and Conclusions: The integration of plant databases using the Ondex showed that it was possible to increase the overall quantity and quality of information available, and thereby improve the resulting annotation. Direct data aggregation benefits were observed, as well as new information derived from inference across databases. The CoPSA pipeline was shown to improve coverage of the wheat microarray compared to the NetAffx and BLAST2GO pipelines. Leverage of these annotations during the analysis of data from a transcriptomics study of the durum wheat water stress responses, yielded new biological insights into water stress and highlighted potential candidate genes that could be used by breeders to improve drought response. 2012-07-11 Thesis (University of Nottingham only) NonPeerReviewed application/pdf en arr https://eprints.nottingham.ac.uk/12580/1/thesis2.pdf Hindle, Matthew Morritt (2012) An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome. PhD thesis, University of Nottingham. Bioinformatics Wheat Transcriptomics Data Integration Query Drought Water Stress |
| spellingShingle | Bioinformatics Wheat Transcriptomics Data Integration Query Drought Water Stress Hindle, Matthew Morritt An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome |
| title | An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome |
| title_full | An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome |
| title_fullStr | An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome |
| title_full_unstemmed | An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome |
| title_short | An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome |
| title_sort | integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome |
| topic | Bioinformatics Wheat Transcriptomics Data Integration Query Drought Water Stress |
| url | https://eprints.nottingham.ac.uk/12580/ |