The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats

Unmapped next-generation sequencing reads are typically ignored while they contain biologically relevant information. We systematically analyzed unmapped reads from whole genome sequencing of 33 inbred rat strains. High quality reads were selected and enriched for biologically relevant sequences; si...

Full description

Bibliographic Details
Main Authors: van der Weide, Robin H., Simonis, Marieke, Hermsen, Roel, Toonen, Pim, Cuppen, Edwin, de Ligt, Joep
Format: Online
Language:English
Published: Public Library of Science 2016
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4976967/
id pubmed-4976967
recordtype oai_dc
spelling pubmed-49769672016-08-25 The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats van der Weide, Robin H. Simonis, Marieke Hermsen, Roel Toonen, Pim Cuppen, Edwin de Ligt, Joep Research Article Unmapped next-generation sequencing reads are typically ignored while they contain biologically relevant information. We systematically analyzed unmapped reads from whole genome sequencing of 33 inbred rat strains. High quality reads were selected and enriched for biologically relevant sequences; similarity-based analysis revealed clustering similar to previously reported phylogenetic trees. Our results demonstrate that on average 20% of all unmapped reads harbor sequences that can be used to improve reference genomes and generate hypotheses on potential genotype-phenotype relationships. Analysis pipelines would benefit from incorporating the described methods and reference genomes would benefit from inclusion of the genomic segments obtained through these efforts. Public Library of Science 2016-08-08 /pmc/articles/PMC4976967/ /pubmed/27501045 http://dx.doi.org/10.1371/journal.pone.0160036 Text en © 2016 van der Weide et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
repository_type Open Access Journal
institution_category Foreign Institution
institution US National Center for Biotechnology Information
building NCBI PubMed
collection Online Access
language English
format Online
author van der Weide, Robin H.
Simonis, Marieke
Hermsen, Roel
Toonen, Pim
Cuppen, Edwin
de Ligt, Joep
spellingShingle van der Weide, Robin H.
Simonis, Marieke
Hermsen, Roel
Toonen, Pim
Cuppen, Edwin
de Ligt, Joep
The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats
author_facet van der Weide, Robin H.
Simonis, Marieke
Hermsen, Roel
Toonen, Pim
Cuppen, Edwin
de Ligt, Joep
author_sort van der Weide, Robin H.
title The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats
title_short The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats
title_full The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats
title_fullStr The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats
title_full_unstemmed The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats
title_sort genomic scrapheap challenge; extracting relevant data from unmapped whole genome sequencing reads, including strain specific genomic segments, in rats
description Unmapped next-generation sequencing reads are typically ignored while they contain biologically relevant information. We systematically analyzed unmapped reads from whole genome sequencing of 33 inbred rat strains. High quality reads were selected and enriched for biologically relevant sequences; similarity-based analysis revealed clustering similar to previously reported phylogenetic trees. Our results demonstrate that on average 20% of all unmapped reads harbor sequences that can be used to improve reference genomes and generate hypotheses on potential genotype-phenotype relationships. Analysis pipelines would benefit from incorporating the described methods and reference genomes would benefit from inclusion of the genomic segments obtained through these efforts.
publisher Public Library of Science
publishDate 2016
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4976967/
_version_ 1613623386158661632