Leveraging Hierarchical Population Structure in Discrete Association Studies

Population structure can confound the identification of correlations in biological data. Such confounding has been recognized in multiple biological disciplines, resulting in a disparate collection of proposed solutions. We examine several methods that correct for confounding on discrete data with h...

Full description

Bibliographic Details
Main Authors: Carlson, Jonathan, Kadie, Carl, Mallal, Simon, Heckerman, David
Format: Online
Language:English
Published: Public Library of Science 2007
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1899226/
id pubmed-1899226
recordtype oai_dc
spelling pubmed-18992262007-07-04 Leveraging Hierarchical Population Structure in Discrete Association Studies Carlson, Jonathan Kadie, Carl Mallal, Simon Heckerman, David Research Article Population structure can confound the identification of correlations in biological data. Such confounding has been recognized in multiple biological disciplines, resulting in a disparate collection of proposed solutions. We examine several methods that correct for confounding on discrete data with hierarchical population structure and identify two distinct confounding processes, which we call coevolution and conditional influence. We describe these processes in terms of generative models and show that these generative models can be used to correct for the confounding effects. Finally, we apply the models to three applications: identification of escape mutations in HIV-1 in response to specific HLA-mediated immune pressure, prediction of coevolving residues in an HIV-1 peptide, and a search for genotypes that are associated with bacterial resistance traits in Arabidopsis thaliana. We show that coevolution is a better description of confounding in some applications and conditional influence is better in others. That is, we show that no single method is best for addressing all forms of confounding. Analysis tools based on these models are available on the internet as both web based applications and downloadable source code at http://atom.research.microsoft.com/bio/phylod.aspx. Public Library of Science 2007-07-04 /pmc/articles/PMC1899226/ /pubmed/17611623 http://dx.doi.org/10.1371/journal.pone.0000591 Text en Carlson et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
repository_type Open Access Journal
institution_category Foreign Institution
institution US National Center for Biotechnology Information
building NCBI PubMed
collection Online Access
language English
format Online
author Carlson, Jonathan
Kadie, Carl
Mallal, Simon
Heckerman, David
spellingShingle Carlson, Jonathan
Kadie, Carl
Mallal, Simon
Heckerman, David
Leveraging Hierarchical Population Structure in Discrete Association Studies
author_facet Carlson, Jonathan
Kadie, Carl
Mallal, Simon
Heckerman, David
author_sort Carlson, Jonathan
title Leveraging Hierarchical Population Structure in Discrete Association Studies
title_short Leveraging Hierarchical Population Structure in Discrete Association Studies
title_full Leveraging Hierarchical Population Structure in Discrete Association Studies
title_fullStr Leveraging Hierarchical Population Structure in Discrete Association Studies
title_full_unstemmed Leveraging Hierarchical Population Structure in Discrete Association Studies
title_sort leveraging hierarchical population structure in discrete association studies
description Population structure can confound the identification of correlations in biological data. Such confounding has been recognized in multiple biological disciplines, resulting in a disparate collection of proposed solutions. We examine several methods that correct for confounding on discrete data with hierarchical population structure and identify two distinct confounding processes, which we call coevolution and conditional influence. We describe these processes in terms of generative models and show that these generative models can be used to correct for the confounding effects. Finally, we apply the models to three applications: identification of escape mutations in HIV-1 in response to specific HLA-mediated immune pressure, prediction of coevolving residues in an HIV-1 peptide, and a search for genotypes that are associated with bacterial resistance traits in Arabidopsis thaliana. We show that coevolution is a better description of confounding in some applications and conditional influence is better in others. That is, we show that no single method is best for addressing all forms of confounding. Analysis tools based on these models are available on the internet as both web based applications and downloadable source code at http://atom.research.microsoft.com/bio/phylod.aspx.
publisher Public Library of Science
publishDate 2007
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1899226/
_version_ 1611397667000156160