Bayesian Estimation of Nonsynonymous/Synonymous Rate Ratios for Pairwise Sequence Comparisons

The nonsynonymous/synonymous rate ratio (ω = dN/dS) is an important measure of the mode and strength of natural selection acting on nonsynonymous mutations in protein-coding genes. The simplest such analysis is the estimation of the dN/dS ratio using two sequences. Both heuristic counting methods an...

Full description

Bibliographic Details
Main Authors: Angelis, Konstantinos, dos Reis, Mario, Yang, Ziheng
Format: Online
Language:English
Published: Oxford University Press 2014
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4069626/
id pubmed-4069626
recordtype oai_dc
spelling pubmed-40696262014-06-25 Bayesian Estimation of Nonsynonymous/Synonymous Rate Ratios for Pairwise Sequence Comparisons Angelis, Konstantinos dos Reis, Mario Yang, Ziheng Methods The nonsynonymous/synonymous rate ratio (ω = dN/dS) is an important measure of the mode and strength of natural selection acting on nonsynonymous mutations in protein-coding genes. The simplest such analysis is the estimation of the dN/dS ratio using two sequences. Both heuristic counting methods and the maximum-likelihood (ML) method based on a codon substitution model are widely used for such analysis. However, these methods do not have nice statistical properties, as the estimates can be zero or infinity in some data sets, so that their means and variances are infinite. In large genome-scale comparisons, such extreme estimates (either 0 or ∞) of ω and sequence distance (t) are common. Here, we implement a Bayesian method to estimate ω and t in pairwise sequence comparisons. Using a combination of computer simulation and real data analysis, we show that the Bayesian estimates have better statistical properties than the ML estimates, because the prior on ω and t shrinks the posterior of those parameters away from extreme values. We also calculate the posterior probability for ω > 1 as a Bayesian alternative to the likelihood ratio test. The new method is computationally efficient and may be useful for genome-scale comparisons of protein-coding gene sequences. Oxford University Press 2014-07 2014-04-18 /pmc/articles/PMC4069626/ /pubmed/24748652 http://dx.doi.org/10.1093/molbev/msu142 Text en © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
repository_type Open Access Journal
institution_category Foreign Institution
institution US National Center for Biotechnology Information
building NCBI PubMed
collection Online Access
language English
format Online
author Angelis, Konstantinos
dos Reis, Mario
Yang, Ziheng
spellingShingle Angelis, Konstantinos
dos Reis, Mario
Yang, Ziheng
Bayesian Estimation of Nonsynonymous/Synonymous Rate Ratios for Pairwise Sequence Comparisons
author_facet Angelis, Konstantinos
dos Reis, Mario
Yang, Ziheng
author_sort Angelis, Konstantinos
title Bayesian Estimation of Nonsynonymous/Synonymous Rate Ratios for Pairwise Sequence Comparisons
title_short Bayesian Estimation of Nonsynonymous/Synonymous Rate Ratios for Pairwise Sequence Comparisons
title_full Bayesian Estimation of Nonsynonymous/Synonymous Rate Ratios for Pairwise Sequence Comparisons
title_fullStr Bayesian Estimation of Nonsynonymous/Synonymous Rate Ratios for Pairwise Sequence Comparisons
title_full_unstemmed Bayesian Estimation of Nonsynonymous/Synonymous Rate Ratios for Pairwise Sequence Comparisons
title_sort bayesian estimation of nonsynonymous/synonymous rate ratios for pairwise sequence comparisons
description The nonsynonymous/synonymous rate ratio (ω = dN/dS) is an important measure of the mode and strength of natural selection acting on nonsynonymous mutations in protein-coding genes. The simplest such analysis is the estimation of the dN/dS ratio using two sequences. Both heuristic counting methods and the maximum-likelihood (ML) method based on a codon substitution model are widely used for such analysis. However, these methods do not have nice statistical properties, as the estimates can be zero or infinity in some data sets, so that their means and variances are infinite. In large genome-scale comparisons, such extreme estimates (either 0 or ∞) of ω and sequence distance (t) are common. Here, we implement a Bayesian method to estimate ω and t in pairwise sequence comparisons. Using a combination of computer simulation and real data analysis, we show that the Bayesian estimates have better statistical properties than the ML estimates, because the prior on ω and t shrinks the posterior of those parameters away from extreme values. We also calculate the posterior probability for ω > 1 as a Bayesian alternative to the likelihood ratio test. The new method is computationally efficient and may be useful for genome-scale comparisons of protein-coding gene sequences.
publisher Oxford University Press
publishDate 2014
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4069626/
_version_ 1612105725082861568