Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks

Multiple Sequence Alignment (MSA) methods are typically benchmarked on sets of reference alignments. The quality of the alignment can then be represented by the sum-of-pairs (SP) or column (CS) scores, which measure the agreement between a reference and corresponding query alignment. Both the SP and...

Full description

Bibliographic Details
Main Authors: Bawono, Punto, van der Velde, Arjan, Abeln, Sanne, Heringa, Jaap
Format: Online
Language:English
Published: Public Library of Science 2015
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4438059/
id pubmed-4438059
recordtype oai_dc
spelling pubmed-44380592015-05-29 Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks Bawono, Punto van der Velde, Arjan Abeln, Sanne Heringa, Jaap Research Article Multiple Sequence Alignment (MSA) methods are typically benchmarked on sets of reference alignments. The quality of the alignment can then be represented by the sum-of-pairs (SP) or column (CS) scores, which measure the agreement between a reference and corresponding query alignment. Both the SP and CS scores treat mismatches between a query and reference alignment as equally bad, and do not take the separation into account between two amino acids in the query alignment, that should have been matched according to the reference alignment. This is significant since the magnitude of alignment shifts is often of relevance in biological analyses, including homology modeling and MSA refinement/manual alignment editing. In this study we develop a new alignment benchmark scoring scheme, SPdist, that takes the degree of discordance of mismatches into account by measuring the sequence distance between mismatched residue pairs in the query alignment. Using this new score along with the standard SP score, we investigate the discriminatory behavior of the new score by assessing how well six different MSA methods perform with respect to BAliBASE reference alignments. The SP score and the SPdist score yield very similar outcomes when the reference and query alignments are close. However, for more divergent reference alignments the SPdist score is able to distinguish between methods that keep alignments approximately close to the reference and those exhibiting larger shifts. We observed that by using SPdist together with SP scoring we were able to better delineate the alignment quality difference between alternative MSA methods. With a case study we exemplify why it is important, from a biological perspective, to consider the separation of mismatches. The SPdist scoring scheme has been implemented in the VerAlign web server (http://www.ibi.vu.nl/programs/veralignwww/). The code for calculating SPdist score is also available upon request. Public Library of Science 2015-05-19 /pmc/articles/PMC4438059/ /pubmed/25993129 http://dx.doi.org/10.1371/journal.pone.0127431 Text en © 2015 Bawono et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
repository_type Open Access Journal
institution_category Foreign Institution
institution US National Center for Biotechnology Information
building NCBI PubMed
collection Online Access
language English
format Online
author Bawono, Punto
van der Velde, Arjan
Abeln, Sanne
Heringa, Jaap
spellingShingle Bawono, Punto
van der Velde, Arjan
Abeln, Sanne
Heringa, Jaap
Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks
author_facet Bawono, Punto
van der Velde, Arjan
Abeln, Sanne
Heringa, Jaap
author_sort Bawono, Punto
title Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks
title_short Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks
title_full Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks
title_fullStr Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks
title_full_unstemmed Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks
title_sort quantifying the displacement of mismatches in multiple sequence alignment benchmarks
description Multiple Sequence Alignment (MSA) methods are typically benchmarked on sets of reference alignments. The quality of the alignment can then be represented by the sum-of-pairs (SP) or column (CS) scores, which measure the agreement between a reference and corresponding query alignment. Both the SP and CS scores treat mismatches between a query and reference alignment as equally bad, and do not take the separation into account between two amino acids in the query alignment, that should have been matched according to the reference alignment. This is significant since the magnitude of alignment shifts is often of relevance in biological analyses, including homology modeling and MSA refinement/manual alignment editing. In this study we develop a new alignment benchmark scoring scheme, SPdist, that takes the degree of discordance of mismatches into account by measuring the sequence distance between mismatched residue pairs in the query alignment. Using this new score along with the standard SP score, we investigate the discriminatory behavior of the new score by assessing how well six different MSA methods perform with respect to BAliBASE reference alignments. The SP score and the SPdist score yield very similar outcomes when the reference and query alignments are close. However, for more divergent reference alignments the SPdist score is able to distinguish between methods that keep alignments approximately close to the reference and those exhibiting larger shifts. We observed that by using SPdist together with SP scoring we were able to better delineate the alignment quality difference between alternative MSA methods. With a case study we exemplify why it is important, from a biological perspective, to consider the separation of mismatches. The SPdist scoring scheme has been implemented in the VerAlign web server (http://www.ibi.vu.nl/programs/veralignwww/). The code for calculating SPdist score is also available upon request.
publisher Public Library of Science
publishDate 2015
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4438059/
_version_ 1613225718483779584