Using the Pareto principle in genome-wide breeding value estimation

Genome-wide breeding value (GWEBV) estimation methods can be classified based on the prior distribution assumptions of marker effects. Genome-wide BLUP methods assume a normal prior distribution for all markers with a constant variance, and are computationally fast. In Bayesian methods, more flexibl...

Full description

Bibliographic Details
Main Authors: Yu, Xijiang, Meuwissen, Theo HE
Format: Online
Language:English
Published: BioMed Central 2011
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3354342/
id pubmed-3354342
recordtype oai_dc
spelling pubmed-33543422012-05-18 Using the Pareto principle in genome-wide breeding value estimation Yu, Xijiang Meuwissen, Theo HE Research Genome-wide breeding value (GWEBV) estimation methods can be classified based on the prior distribution assumptions of marker effects. Genome-wide BLUP methods assume a normal prior distribution for all markers with a constant variance, and are computationally fast. In Bayesian methods, more flexible prior distributions of SNP effects are applied that allow for very large SNP effects although most are small or even zero, but these prior distributions are often also computationally demanding as they rely on Monte Carlo Markov chain sampling. In this study, we adopted the Pareto principle to weight available marker loci, i.e., we consider that x% of the loci explain (100 - x)% of the total genetic variance. Assuming this principle, it is also possible to define the variances of the prior distribution of the 'big' and 'small' SNP. The relatively few large SNP explain a large proportion of the genetic variance and the majority of the SNP show small effects and explain a minor proportion of the genetic variance. We name this method MixP, where the prior distribution is a mixture of two normal distributions, i.e. one with a big variance and one with a small variance. Simulation results, using a real Norwegian Red cattle pedigree, show that MixP is at least as accurate as the other methods in all studied cases. This method also reduces the hyper-parameters of the prior distribution from 2 (proportion and variance of SNP with big effects) to 1 (proportion of SNP with big effects), assuming the overall genetic variance is known. The mixture of normal distribution prior made it possible to solve the equations iteratively, which greatly reduced computation loads by two orders of magnitude. In the era of marker density reaching million(s) and whole-genome sequence data, MixP provides a computationally feasible Bayesian method of analysis. BioMed Central 2011-11-01 /pmc/articles/PMC3354342/ /pubmed/22044555 http://dx.doi.org/10.1186/1297-9686-43-35 Text en Copyright ©2011 Yu and Meuwissen; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
repository_type Open Access Journal
institution_category Foreign Institution
institution US National Center for Biotechnology Information
building NCBI PubMed
collection Online Access
language English
format Online
author Yu, Xijiang
Meuwissen, Theo HE
spellingShingle Yu, Xijiang
Meuwissen, Theo HE
Using the Pareto principle in genome-wide breeding value estimation
author_facet Yu, Xijiang
Meuwissen, Theo HE
author_sort Yu, Xijiang
title Using the Pareto principle in genome-wide breeding value estimation
title_short Using the Pareto principle in genome-wide breeding value estimation
title_full Using the Pareto principle in genome-wide breeding value estimation
title_fullStr Using the Pareto principle in genome-wide breeding value estimation
title_full_unstemmed Using the Pareto principle in genome-wide breeding value estimation
title_sort using the pareto principle in genome-wide breeding value estimation
description Genome-wide breeding value (GWEBV) estimation methods can be classified based on the prior distribution assumptions of marker effects. Genome-wide BLUP methods assume a normal prior distribution for all markers with a constant variance, and are computationally fast. In Bayesian methods, more flexible prior distributions of SNP effects are applied that allow for very large SNP effects although most are small or even zero, but these prior distributions are often also computationally demanding as they rely on Monte Carlo Markov chain sampling. In this study, we adopted the Pareto principle to weight available marker loci, i.e., we consider that x% of the loci explain (100 - x)% of the total genetic variance. Assuming this principle, it is also possible to define the variances of the prior distribution of the 'big' and 'small' SNP. The relatively few large SNP explain a large proportion of the genetic variance and the majority of the SNP show small effects and explain a minor proportion of the genetic variance. We name this method MixP, where the prior distribution is a mixture of two normal distributions, i.e. one with a big variance and one with a small variance. Simulation results, using a real Norwegian Red cattle pedigree, show that MixP is at least as accurate as the other methods in all studied cases. This method also reduces the hyper-parameters of the prior distribution from 2 (proportion and variance of SNP with big effects) to 1 (proportion of SNP with big effects), assuming the overall genetic variance is known. The mixture of normal distribution prior made it possible to solve the equations iteratively, which greatly reduced computation loads by two orders of magnitude. In the era of marker density reaching million(s) and whole-genome sequence data, MixP provides a computationally feasible Bayesian method of analysis.
publisher BioMed Central
publishDate 2011
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3354342/
_version_ 1611530465827618816