An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome

Given the ever increasing quantity of sequence data, functional annotation of new gene sequences persists as being a significant challenge for bioinformatics. This is a particular problem for transcriptomics studies in crop plants where large genomes and evolutionarily distant model organisms, means...

Full description

Bibliographic Details
Main Author: Hindle, Matthew Morritt
Format: Thesis (University of Nottingham only)
Language:English
Published: 2012
Subjects:
Online Access:https://eprints.nottingham.ac.uk/12580/
_version_ 1848791530760830976
author Hindle, Matthew Morritt
author_facet Hindle, Matthew Morritt
author_sort Hindle, Matthew Morritt
building Nottingham Research Data Repository
collection Online Access
description Given the ever increasing quantity of sequence data, functional annotation of new gene sequences persists as being a significant challenge for bioinformatics. This is a particular problem for transcriptomics studies in crop plants where large genomes and evolutionarily distant model organisms, means that identifying the function of a given gene used on a microarray, is often a non-trivial task. Information pertinent to gene annotations is spread across technically and semantically heterogeneous biological databases. Combining and exploiting these data in a consistent way has the potential to improve our ability to assign functions to new or uncharacterised genes. Methods: The Ondex data integration framework was further developed to integrate databases pertinent to plant gene annotation, and provide data inference tools. The CoPSA annotation pipeline was created to provide automated annotation of novel plant genes using this knowledgebase. CoPSA was used to derive annotations for Affymetrix GeneChips available for plant species. A conjoint approach was used to align GeneChip sequences to orthologous proteins, and identify protein domain regions. These proteins and domains were used together with multiple evidences to predict functional annotations for sequences on the GeneChip. Quality was assessed with reference to other annotation pipelines. These improved gene annotations were used in the analysis of a time-series transcriptomics study of the differential responses of durum wheat varieties to water stress. Results and Conclusions: The integration of plant databases using the Ondex showed that it was possible to increase the overall quantity and quality of information available, and thereby improve the resulting annotation. Direct data aggregation benefits were observed, as well as new information derived from inference across databases. The CoPSA pipeline was shown to improve coverage of the wheat microarray compared to the NetAffx and BLAST2GO pipelines. Leverage of these annotations during the analysis of data from a transcriptomics study of the durum wheat water stress responses, yielded new biological insights into water stress and highlighted potential candidate genes that could be used by breeders to improve drought response.
first_indexed 2025-11-14T18:29:59Z
format Thesis (University of Nottingham only)
id nottingham-12580
institution University of Nottingham Malaysia Campus
institution_category Local University
language English
last_indexed 2025-11-14T18:29:59Z
publishDate 2012
recordtype eprints
repository_type Digital Repository
spelling nottingham-125802025-02-28T11:20:05Z https://eprints.nottingham.ac.uk/12580/ An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome Hindle, Matthew Morritt Given the ever increasing quantity of sequence data, functional annotation of new gene sequences persists as being a significant challenge for bioinformatics. This is a particular problem for transcriptomics studies in crop plants where large genomes and evolutionarily distant model organisms, means that identifying the function of a given gene used on a microarray, is often a non-trivial task. Information pertinent to gene annotations is spread across technically and semantically heterogeneous biological databases. Combining and exploiting these data in a consistent way has the potential to improve our ability to assign functions to new or uncharacterised genes. Methods: The Ondex data integration framework was further developed to integrate databases pertinent to plant gene annotation, and provide data inference tools. The CoPSA annotation pipeline was created to provide automated annotation of novel plant genes using this knowledgebase. CoPSA was used to derive annotations for Affymetrix GeneChips available for plant species. A conjoint approach was used to align GeneChip sequences to orthologous proteins, and identify protein domain regions. These proteins and domains were used together with multiple evidences to predict functional annotations for sequences on the GeneChip. Quality was assessed with reference to other annotation pipelines. These improved gene annotations were used in the analysis of a time-series transcriptomics study of the differential responses of durum wheat varieties to water stress. Results and Conclusions: The integration of plant databases using the Ondex showed that it was possible to increase the overall quantity and quality of information available, and thereby improve the resulting annotation. Direct data aggregation benefits were observed, as well as new information derived from inference across databases. The CoPSA pipeline was shown to improve coverage of the wheat microarray compared to the NetAffx and BLAST2GO pipelines. Leverage of these annotations during the analysis of data from a transcriptomics study of the durum wheat water stress responses, yielded new biological insights into water stress and highlighted potential candidate genes that could be used by breeders to improve drought response. 2012-07-11 Thesis (University of Nottingham only) NonPeerReviewed application/pdf en arr https://eprints.nottingham.ac.uk/12580/1/thesis2.pdf Hindle, Matthew Morritt (2012) An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome. PhD thesis, University of Nottingham. Bioinformatics Wheat Transcriptomics Data Integration Query Drought Water Stress
spellingShingle Bioinformatics
Wheat
Transcriptomics
Data Integration
Query
Drought
Water Stress
Hindle, Matthew Morritt
An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome
title An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome
title_full An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome
title_fullStr An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome
title_full_unstemmed An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome
title_short An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome
title_sort integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome
topic Bioinformatics
Wheat
Transcriptomics
Data Integration
Query
Drought
Water Stress
url https://eprints.nottingham.ac.uk/12580/