Multiple alignment of protein sequences with repeats and rearrangements
Multiple sequence alignments are the usual starting point for analyses of protein structure and evolution. For proteins with repeated, shuffled and missing domains, however, traditional multiple sequence alignment algorithms fail to provide an accurate view of homology between related proteins, beca...
Main Authors: | , , , |
---|---|
Format: | Online |
Language: | English |
Published: |
Oxford University Press
2006
|
Online Access: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1635250/ |
id |
pubmed-1635250 |
---|---|
recordtype |
oai_dc |
spelling |
pubmed-16352502006-12-26 Multiple alignment of protein sequences with repeats and rearrangements Phuong, Tu Minh Do, Chuong B. Edgar, Robert C. Batzoglou, Serafim Computational Biology Multiple sequence alignments are the usual starting point for analyses of protein structure and evolution. For proteins with repeated, shuffled and missing domains, however, traditional multiple sequence alignment algorithms fail to provide an accurate view of homology between related proteins, because they either assume that the input sequences are globally alignable or require locally alignable regions to appear in the same order in all sequences. In this paper, we present ProDA, a novel system for automated detection and alignment of homologous regions in collections of proteins with arbitrary domain architectures. Given an input set of unaligned sequences, ProDA identifies all homologous regions appearing in one or more sequences, and returns a collection of local multiple alignments for these regions. On a subset of the BAliBASE benchmarking suite containing curated alignments of proteins with complicated domain architectures, ProDA performs well in detecting conserved domain boundaries and clustering domain segments, achieving the highest accuracy to date for this task. We conclude that ProDA is a practical tool for automated alignment of protein sequences with repeats and rearrangements in their domain architecture. Oxford University Press 2006-11 2006-11-26 /pmc/articles/PMC1635250/ /pubmed/17068081 http://dx.doi.org/10.1093/nar/gkl511 Text en © 2006 The Author(s) |
repository_type |
Open Access Journal |
institution_category |
Foreign Institution |
institution |
US National Center for Biotechnology Information |
building |
NCBI PubMed |
collection |
Online Access |
language |
English |
format |
Online |
author |
Phuong, Tu Minh Do, Chuong B. Edgar, Robert C. Batzoglou, Serafim |
spellingShingle |
Phuong, Tu Minh Do, Chuong B. Edgar, Robert C. Batzoglou, Serafim Multiple alignment of protein sequences with repeats and rearrangements |
author_facet |
Phuong, Tu Minh Do, Chuong B. Edgar, Robert C. Batzoglou, Serafim |
author_sort |
Phuong, Tu Minh |
title |
Multiple alignment of protein sequences with repeats and rearrangements |
title_short |
Multiple alignment of protein sequences with repeats and rearrangements |
title_full |
Multiple alignment of protein sequences with repeats and rearrangements |
title_fullStr |
Multiple alignment of protein sequences with repeats and rearrangements |
title_full_unstemmed |
Multiple alignment of protein sequences with repeats and rearrangements |
title_sort |
multiple alignment of protein sequences with repeats and rearrangements |
description |
Multiple sequence alignments are the usual starting point for analyses of protein structure and evolution. For proteins with repeated, shuffled and missing domains, however, traditional multiple sequence alignment algorithms fail to provide an accurate view of homology between related proteins, because they either assume that the input sequences are globally alignable or require locally alignable regions to appear in the same order in all sequences. In this paper, we present ProDA, a novel system for automated detection and alignment of homologous regions in collections of proteins with arbitrary domain architectures. Given an input set of unaligned sequences, ProDA identifies all homologous regions appearing in one or more sequences, and returns a collection of local multiple alignments for these regions. On a subset of the BAliBASE benchmarking suite containing curated alignments of proteins with complicated domain architectures, ProDA performs well in detecting conserved domain boundaries and clustering domain segments, achieving the highest accuracy to date for this task. We conclude that ProDA is a practical tool for automated alignment of protein sequences with repeats and rearrangements in their domain architecture. |
publisher |
Oxford University Press |
publishDate |
2006 |
url |
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1635250/ |
_version_ |
1611390463412011008 |