De novo assembly and genotyping of variants using colored de Bruijn graphs

Detecting genetic variants that are highly divergent from a reference sequence remains a major challenge in genome sequencing. We introduce de novo assembly algorithms using colored de Bruijn graphs for detecting and genotyping simple and complex genetic variants in an individual or population. We p...

Full description

Bibliographic Details
Main Authors: Iqbal, Zamin, Caccamo, Mario, Turner, Isaac, Flicek, Paul, McVean, Gil
Format: Online
Language:English
Published: 2012
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3272472/
id pubmed-3272472
recordtype oai_dc
spelling pubmed-32724722012-08-01 De novo assembly and genotyping of variants using colored de Bruijn graphs Iqbal, Zamin Caccamo, Mario Turner, Isaac Flicek, Paul McVean, Gil Article Detecting genetic variants that are highly divergent from a reference sequence remains a major challenge in genome sequencing. We introduce de novo assembly algorithms using colored de Bruijn graphs for detecting and genotyping simple and complex genetic variants in an individual or population. We provide an efficient software implementation, Cortex; the first de novo assembler capable of assembling multiple eukaryote genomes simultaneously. Four applications of Cortex are presented. First, we detect and validate both simple and complex structural variation in a high coverage human genome. Second, we identify over 3Mb of novel sequence in pooled low-coverage population sequence data from the 1000 Genomes Project. Third, we show how population information from 10 chimpanzees enables accurate variant calls without a reference sequence. Finally, we estimate classical HLA genotypes at HLA-B, the most variable gene in the human genome. 2012-01-08 /pmc/articles/PMC3272472/ /pubmed/22231483 http://dx.doi.org/10.1038/ng.1028 Text en Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms
repository_type Open Access Journal
institution_category Foreign Institution
institution US National Center for Biotechnology Information
building NCBI PubMed
collection Online Access
language English
format Online
author Iqbal, Zamin
Caccamo, Mario
Turner, Isaac
Flicek, Paul
McVean, Gil
spellingShingle Iqbal, Zamin
Caccamo, Mario
Turner, Isaac
Flicek, Paul
McVean, Gil
De novo assembly and genotyping of variants using colored de Bruijn graphs
author_facet Iqbal, Zamin
Caccamo, Mario
Turner, Isaac
Flicek, Paul
McVean, Gil
author_sort Iqbal, Zamin
title De novo assembly and genotyping of variants using colored de Bruijn graphs
title_short De novo assembly and genotyping of variants using colored de Bruijn graphs
title_full De novo assembly and genotyping of variants using colored de Bruijn graphs
title_fullStr De novo assembly and genotyping of variants using colored de Bruijn graphs
title_full_unstemmed De novo assembly and genotyping of variants using colored de Bruijn graphs
title_sort de novo assembly and genotyping of variants using colored de bruijn graphs
description Detecting genetic variants that are highly divergent from a reference sequence remains a major challenge in genome sequencing. We introduce de novo assembly algorithms using colored de Bruijn graphs for detecting and genotyping simple and complex genetic variants in an individual or population. We provide an efficient software implementation, Cortex; the first de novo assembler capable of assembling multiple eukaryote genomes simultaneously. Four applications of Cortex are presented. First, we detect and validate both simple and complex structural variation in a high coverage human genome. Second, we identify over 3Mb of novel sequence in pooled low-coverage population sequence data from the 1000 Genomes Project. Third, we show how population information from 10 chimpanzees enables accurate variant calls without a reference sequence. Finally, we estimate classical HLA genotypes at HLA-B, the most variable gene in the human genome.
publishDate 2012
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3272472/
_version_ 1611503795523551232