Further Improvements to Linear Mixed Models for Genome-Wide Association Studies

We examine improvements to the linear mixed model (LMM) that better correct for population structure and family relatedness in genome-wide association studies (GWAS). LMMs rely on the estimation of a genetic similarity matrix (GSM), which encodes the pairwise similarity between every two individuals...

Full description

Bibliographic Details
Main Authors: Widmer, Christian, Lippert, Christoph, Weissbrod, Omer, Fusi, Nicolo, Kadie, Carl, Davidson, Robert, Listgarten, Jennifer, Heckerman, David
Format: Online
Language:English
Published: Nature Publishing Group 2014
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4230738/
id pubmed-4230738
recordtype oai_dc
spelling pubmed-42307382014-11-17 Further Improvements to Linear Mixed Models for Genome-Wide Association Studies Widmer, Christian Lippert, Christoph Weissbrod, Omer Fusi, Nicolo Kadie, Carl Davidson, Robert Listgarten, Jennifer Heckerman, David Article We examine improvements to the linear mixed model (LMM) that better correct for population structure and family relatedness in genome-wide association studies (GWAS). LMMs rely on the estimation of a genetic similarity matrix (GSM), which encodes the pairwise similarity between every two individuals in a cohort. These similarities are estimated from single nucleotide polymorphisms (SNPs) or other genetic variants. Traditionally, all available SNPs are used to estimate the GSM. In empirical studies across a wide range of synthetic and real data, we find that modifications to this approach improve GWAS performance as measured by type I error control and power. Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM. In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM. Software implementing these improvements and the experimental comparisons are available at http://microsoft.com/science. Nature Publishing Group 2014-11-12 /pmc/articles/PMC4230738/ /pubmed/25387525 http://dx.doi.org/10.1038/srep06874 Text en Copyright © 2014, Macmillan Publishers Limited. All rights reserved http://creativecommons.org/licenses/by-nc-nd/4.0/ This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder in order to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/
repository_type Open Access Journal
institution_category Foreign Institution
institution US National Center for Biotechnology Information
building NCBI PubMed
collection Online Access
language English
format Online
author Widmer, Christian
Lippert, Christoph
Weissbrod, Omer
Fusi, Nicolo
Kadie, Carl
Davidson, Robert
Listgarten, Jennifer
Heckerman, David
spellingShingle Widmer, Christian
Lippert, Christoph
Weissbrod, Omer
Fusi, Nicolo
Kadie, Carl
Davidson, Robert
Listgarten, Jennifer
Heckerman, David
Further Improvements to Linear Mixed Models for Genome-Wide Association Studies
author_facet Widmer, Christian
Lippert, Christoph
Weissbrod, Omer
Fusi, Nicolo
Kadie, Carl
Davidson, Robert
Listgarten, Jennifer
Heckerman, David
author_sort Widmer, Christian
title Further Improvements to Linear Mixed Models for Genome-Wide Association Studies
title_short Further Improvements to Linear Mixed Models for Genome-Wide Association Studies
title_full Further Improvements to Linear Mixed Models for Genome-Wide Association Studies
title_fullStr Further Improvements to Linear Mixed Models for Genome-Wide Association Studies
title_full_unstemmed Further Improvements to Linear Mixed Models for Genome-Wide Association Studies
title_sort further improvements to linear mixed models for genome-wide association studies
description We examine improvements to the linear mixed model (LMM) that better correct for population structure and family relatedness in genome-wide association studies (GWAS). LMMs rely on the estimation of a genetic similarity matrix (GSM), which encodes the pairwise similarity between every two individuals in a cohort. These similarities are estimated from single nucleotide polymorphisms (SNPs) or other genetic variants. Traditionally, all available SNPs are used to estimate the GSM. In empirical studies across a wide range of synthetic and real data, we find that modifications to this approach improve GWAS performance as measured by type I error control and power. Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM. In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM. Software implementing these improvements and the experimental comparisons are available at http://microsoft.com/science.
publisher Nature Publishing Group
publishDate 2014
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4230738/
_version_ 1613156191896076288