Multiple alignment of protein sequences with repeats and rearrangements

Multiple sequence alignments are the usual starting point for analyses of protein structure and evolution. For proteins with repeated, shuffled and missing domains, however, traditional multiple sequence alignment algorithms fail to provide an accurate view of homology between related proteins, beca...

Full description

Bibliographic Details
Main Authors: Phuong, Tu Minh, Do, Chuong B., Edgar, Robert C., Batzoglou, Serafim
Format: Online
Language:English
Published: Oxford University Press 2006
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1635250/
id pubmed-1635250
recordtype oai_dc
spelling pubmed-16352502006-12-26 Multiple alignment of protein sequences with repeats and rearrangements Phuong, Tu Minh Do, Chuong B. Edgar, Robert C. Batzoglou, Serafim Computational Biology Multiple sequence alignments are the usual starting point for analyses of protein structure and evolution. For proteins with repeated, shuffled and missing domains, however, traditional multiple sequence alignment algorithms fail to provide an accurate view of homology between related proteins, because they either assume that the input sequences are globally alignable or require locally alignable regions to appear in the same order in all sequences. In this paper, we present ProDA, a novel system for automated detection and alignment of homologous regions in collections of proteins with arbitrary domain architectures. Given an input set of unaligned sequences, ProDA identifies all homologous regions appearing in one or more sequences, and returns a collection of local multiple alignments for these regions. On a subset of the BAliBASE benchmarking suite containing curated alignments of proteins with complicated domain architectures, ProDA performs well in detecting conserved domain boundaries and clustering domain segments, achieving the highest accuracy to date for this task. We conclude that ProDA is a practical tool for automated alignment of protein sequences with repeats and rearrangements in their domain architecture. Oxford University Press 2006-11 2006-11-26 /pmc/articles/PMC1635250/ /pubmed/17068081 http://dx.doi.org/10.1093/nar/gkl511 Text en © 2006 The Author(s)
repository_type Open Access Journal
institution_category Foreign Institution
institution US National Center for Biotechnology Information
building NCBI PubMed
collection Online Access
language English
format Online
author Phuong, Tu Minh
Do, Chuong B.
Edgar, Robert C.
Batzoglou, Serafim
spellingShingle Phuong, Tu Minh
Do, Chuong B.
Edgar, Robert C.
Batzoglou, Serafim
Multiple alignment of protein sequences with repeats and rearrangements
author_facet Phuong, Tu Minh
Do, Chuong B.
Edgar, Robert C.
Batzoglou, Serafim
author_sort Phuong, Tu Minh
title Multiple alignment of protein sequences with repeats and rearrangements
title_short Multiple alignment of protein sequences with repeats and rearrangements
title_full Multiple alignment of protein sequences with repeats and rearrangements
title_fullStr Multiple alignment of protein sequences with repeats and rearrangements
title_full_unstemmed Multiple alignment of protein sequences with repeats and rearrangements
title_sort multiple alignment of protein sequences with repeats and rearrangements
description Multiple sequence alignments are the usual starting point for analyses of protein structure and evolution. For proteins with repeated, shuffled and missing domains, however, traditional multiple sequence alignment algorithms fail to provide an accurate view of homology between related proteins, because they either assume that the input sequences are globally alignable or require locally alignable regions to appear in the same order in all sequences. In this paper, we present ProDA, a novel system for automated detection and alignment of homologous regions in collections of proteins with arbitrary domain architectures. Given an input set of unaligned sequences, ProDA identifies all homologous regions appearing in one or more sequences, and returns a collection of local multiple alignments for these regions. On a subset of the BAliBASE benchmarking suite containing curated alignments of proteins with complicated domain architectures, ProDA performs well in detecting conserved domain boundaries and clustering domain segments, achieving the highest accuracy to date for this task. We conclude that ProDA is a practical tool for automated alignment of protein sequences with repeats and rearrangements in their domain architecture.
publisher Oxford University Press
publishDate 2006
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1635250/
_version_ 1611390463412011008