Summary: | Alternative transcript processing is an important mechanism for generating functional
diversity in genes. However, little is known about the precise functions of individual
isoforms. In fact, proteins (translated from transcript isoforms), not genes, are the
function carriers. By integrating multiple human RNA-seq data sets, we carried out the
first systematic prediction of isoform functions, enabling high-resolution functional
annotation of human transcriptome. Unlike gene function prediction, isoform function
prediction faces a unique challenge: the lack of the training data—all known
functional annotations are at the gene level. To address this challenge, we modelled the
gene–isoform relationships as multiple instance data and developed a novel label
propagation method to predict functions. Our method achieved an average area under the
receiver operating characteristic curve of 0.67 and assigned functions to 15 572 isoforms.
Interestingly, we observed that different functions have different sensitivities to
alternative isoform processing, and that the function diversity of isoforms from the same
gene is positively correlated with their tissue expression diversity. Finally, we surveyed
the literature to validate our predictions for a number of apoptotic genes. Strikingly,
for the famous ‘TP53’ gene, we not only accurately identified the apoptosis
regulation function of its five isoforms, but also correctly predicted the precise
direction of the regulation.
|