Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks
Multiple Sequence Alignment (MSA) methods are typically benchmarked on sets of reference alignments. The quality of the alignment can then be represented by the sum-of-pairs (SP) or column (CS) scores, which measure the agreement between a reference and corresponding query alignment. Both the SP and...
Main Authors: | , , , |
---|---|
Format: | Online |
Language: | English |
Published: |
Public Library of Science
2015
|
Online Access: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4438059/ |
id |
pubmed-4438059 |
---|---|
recordtype |
oai_dc |
spelling |
pubmed-44380592015-05-29 Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks Bawono, Punto van der Velde, Arjan Abeln, Sanne Heringa, Jaap Research Article Multiple Sequence Alignment (MSA) methods are typically benchmarked on sets of reference alignments. The quality of the alignment can then be represented by the sum-of-pairs (SP) or column (CS) scores, which measure the agreement between a reference and corresponding query alignment. Both the SP and CS scores treat mismatches between a query and reference alignment as equally bad, and do not take the separation into account between two amino acids in the query alignment, that should have been matched according to the reference alignment. This is significant since the magnitude of alignment shifts is often of relevance in biological analyses, including homology modeling and MSA refinement/manual alignment editing. In this study we develop a new alignment benchmark scoring scheme, SPdist, that takes the degree of discordance of mismatches into account by measuring the sequence distance between mismatched residue pairs in the query alignment. Using this new score along with the standard SP score, we investigate the discriminatory behavior of the new score by assessing how well six different MSA methods perform with respect to BAliBASE reference alignments. The SP score and the SPdist score yield very similar outcomes when the reference and query alignments are close. However, for more divergent reference alignments the SPdist score is able to distinguish between methods that keep alignments approximately close to the reference and those exhibiting larger shifts. We observed that by using SPdist together with SP scoring we were able to better delineate the alignment quality difference between alternative MSA methods. With a case study we exemplify why it is important, from a biological perspective, to consider the separation of mismatches. The SPdist scoring scheme has been implemented in the VerAlign web server (http://www.ibi.vu.nl/programs/veralignwww/). The code for calculating SPdist score is also available upon request. Public Library of Science 2015-05-19 /pmc/articles/PMC4438059/ /pubmed/25993129 http://dx.doi.org/10.1371/journal.pone.0127431 Text en © 2015 Bawono et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
repository_type |
Open Access Journal |
institution_category |
Foreign Institution |
institution |
US National Center for Biotechnology Information |
building |
NCBI PubMed |
collection |
Online Access |
language |
English |
format |
Online |
author |
Bawono, Punto van der Velde, Arjan Abeln, Sanne Heringa, Jaap |
spellingShingle |
Bawono, Punto van der Velde, Arjan Abeln, Sanne Heringa, Jaap Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks |
author_facet |
Bawono, Punto van der Velde, Arjan Abeln, Sanne Heringa, Jaap |
author_sort |
Bawono, Punto |
title |
Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks |
title_short |
Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks |
title_full |
Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks |
title_fullStr |
Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks |
title_full_unstemmed |
Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks |
title_sort |
quantifying the displacement of mismatches in multiple sequence alignment benchmarks |
description |
Multiple Sequence Alignment (MSA) methods are typically benchmarked on sets of reference alignments. The quality of the alignment can then be represented by the sum-of-pairs (SP) or column (CS) scores, which measure the agreement between a reference and corresponding query alignment. Both the SP and CS scores treat mismatches between a query and reference alignment as equally bad, and do not take the separation into account between two amino acids in the query alignment, that should have been matched according to the reference alignment. This is significant since the magnitude of alignment shifts is often of relevance in biological analyses, including homology modeling and MSA refinement/manual alignment editing. In this study we develop a new alignment benchmark scoring scheme, SPdist, that takes the degree of discordance of mismatches into account by measuring the sequence distance between mismatched residue pairs in the query alignment. Using this new score along with the standard SP score, we investigate the discriminatory behavior of the new score by assessing how well six different MSA methods perform with respect to BAliBASE reference alignments. The SP score and the SPdist score yield very similar outcomes when the reference and query alignments are close. However, for more divergent reference alignments the SPdist score is able to distinguish between methods that keep alignments approximately close to the reference and those exhibiting larger shifts. We observed that by using SPdist together with SP scoring we were able to better delineate the alignment quality difference between alternative MSA methods. With a case study we exemplify why it is important, from a biological perspective, to consider the separation of mismatches. The SPdist scoring scheme has been implemented in the VerAlign web server (http://www.ibi.vu.nl/programs/veralignwww/). The code for calculating SPdist score is also available upon request. |
publisher |
Public Library of Science |
publishDate |
2015 |
url |
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4438059/ |
_version_ |
1613225718483779584 |