LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms

Obtaining genome-wide genotype data from a set of individuals is the first step in many genomic studies, including genome-wide association and genomic selection. All genotyping methods suffer from some level of missing data, and genotype imputation can be used to fill in the missing data and improve...

Full description

Bibliographic Details
Main Authors: Money, Daniel, Gardner, Kyle, Migicovsky, Zoë, Schwaninger, Heidi, Zhong, Gan-Yuan, Myles, Sean
Format: Online
Language:English
Published: Genetics Society of America 2015
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4632058/
id pubmed-4632058
recordtype oai_dc
spelling pubmed-46320582015-11-04 LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms Money, Daniel Gardner, Kyle Migicovsky, Zoë Schwaninger, Heidi Zhong, Gan-Yuan Myles, Sean Investigations Obtaining genome-wide genotype data from a set of individuals is the first step in many genomic studies, including genome-wide association and genomic selection. All genotyping methods suffer from some level of missing data, and genotype imputation can be used to fill in the missing data and improve the power of downstream analyses. Model organisms like human and cattle benefit from high-quality reference genomes and panels of reference genotypes that aid in imputation accuracy. In nonmodel organisms, however, genetic and physical maps often are either of poor quality or are completely absent, and there are no panels of reference genotypes available. There is therefore a need for imputation methods designed specifically for nonmodel organisms in which genomic resources are poorly developed and marker order is unreliable or unknown. Here we introduce LinkImpute, a software package based on a k-nearest neighbor genotype imputation method, LD-kNNi, which is designed for unordered markers. No physical or genetic maps are required, and it is designed to work on unphased genotype data from heterozygous species. It exploits the fact that markers useful for imputation often are not physically close to the missing genotype but rather distributed throughout the genome. Using genotyping-by-sequencing data from diverse and heterozygous accessions of apples, grapes, and maize, we compare LD-kNNi with several genotype imputation methods and show that LD-kNNi is fast, comparable in accuracy to the best-existing methods, and exhibits the least bias in allele frequency estimates. Genetics Society of America 2015-09-15 /pmc/articles/PMC4632058/ /pubmed/26377960 http://dx.doi.org/10.1534/g3.115.021667 Text en Copyright © 2015 Money et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
repository_type Open Access Journal
institution_category Foreign Institution
institution US National Center for Biotechnology Information
building NCBI PubMed
collection Online Access
language English
format Online
author Money, Daniel
Gardner, Kyle
Migicovsky, Zoë
Schwaninger, Heidi
Zhong, Gan-Yuan
Myles, Sean
spellingShingle Money, Daniel
Gardner, Kyle
Migicovsky, Zoë
Schwaninger, Heidi
Zhong, Gan-Yuan
Myles, Sean
LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms
author_facet Money, Daniel
Gardner, Kyle
Migicovsky, Zoë
Schwaninger, Heidi
Zhong, Gan-Yuan
Myles, Sean
author_sort Money, Daniel
title LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms
title_short LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms
title_full LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms
title_fullStr LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms
title_full_unstemmed LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms
title_sort linkimpute: fast and accurate genotype imputation for nonmodel organisms
description Obtaining genome-wide genotype data from a set of individuals is the first step in many genomic studies, including genome-wide association and genomic selection. All genotyping methods suffer from some level of missing data, and genotype imputation can be used to fill in the missing data and improve the power of downstream analyses. Model organisms like human and cattle benefit from high-quality reference genomes and panels of reference genotypes that aid in imputation accuracy. In nonmodel organisms, however, genetic and physical maps often are either of poor quality or are completely absent, and there are no panels of reference genotypes available. There is therefore a need for imputation methods designed specifically for nonmodel organisms in which genomic resources are poorly developed and marker order is unreliable or unknown. Here we introduce LinkImpute, a software package based on a k-nearest neighbor genotype imputation method, LD-kNNi, which is designed for unordered markers. No physical or genetic maps are required, and it is designed to work on unphased genotype data from heterozygous species. It exploits the fact that markers useful for imputation often are not physically close to the missing genotype but rather distributed throughout the genome. Using genotyping-by-sequencing data from diverse and heterozygous accessions of apples, grapes, and maize, we compare LD-kNNi with several genotype imputation methods and show that LD-kNNi is fast, comparable in accuracy to the best-existing methods, and exhibits the least bias in allele frequency estimates.
publisher Genetics Society of America
publishDate 2015
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4632058/
_version_ 1613496955365752832