Inferring Phylogenies from RAD Sequence Data

Reduced-representation genome sequencing represents a new source of data for systematics, and its potential utility in interspecific phylogeny reconstruction has not yet been explored. One approach that seems especially promising is the use of inexpensive short-read technologies (e.g., Illumina, SOL...

Full description

Bibliographic Details
Main Authors: Rubin, Benjamin E. R., Ree, Richard H., Moreau, Corrie S.
Format: Online
Language:English
Published: Public Library of Science 2012
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3320897/
id pubmed-3320897
recordtype oai_dc
spelling pubmed-33208972012-04-10 Inferring Phylogenies from RAD Sequence Data Rubin, Benjamin E. R. Ree, Richard H. Moreau, Corrie S. Research Article Reduced-representation genome sequencing represents a new source of data for systematics, and its potential utility in interspecific phylogeny reconstruction has not yet been explored. One approach that seems especially promising is the use of inexpensive short-read technologies (e.g., Illumina, SOLiD) to sequence restriction-site associated DNA (RAD) – the regions of the genome that flank the recognition sites of restriction enzymes. In this study, we simulated the collection of RAD sequences from sequenced genomes of different taxa (Drosophila, mammals, and yeasts) and developed a proof-of-concept workflow to test whether informative data could be extracted and used to accurately reconstruct “known” phylogenies of species within each group. The workflow consists of three basic steps: first, sequences are clustered by similarity to estimate orthology; second, clusters are filtered by taxonomic coverage; and third, they are aligned and concatenated for “total evidence” phylogenetic analysis. We evaluated the performance of clustering and filtering parameters by comparing the resulting topologies with well-supported reference trees and we were able to identify conditions under which the reference tree was inferred with high support. For Drosophila, whole genome alignments allowed us to directly evaluate which parameters most consistently recovered orthologous sequences. For the parameter ranges explored, we recovered the best results at the low ends of sequence similarity and taxonomic representation of loci; these generated the largest supermatrices with the highest proportion of missing data. Applications of the method to mammals and yeasts were less successful, which we suggest may be due partly to their much deeper evolutionary divergence times compared to Drosophila (crown ages of approximately 100 and 300 versus 60 Mya, respectively). RAD sequences thus appear to hold promise for reconstructing phylogenetic relationships in younger clades in which sufficient numbers of orthologous restriction sites are retained across species. Public Library of Science 2012-04-06 /pmc/articles/PMC3320897/ /pubmed/22493668 http://dx.doi.org/10.1371/journal.pone.0033394 Text en Rubin et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
repository_type Open Access Journal
institution_category Foreign Institution
institution US National Center for Biotechnology Information
building NCBI PubMed
collection Online Access
language English
format Online
author Rubin, Benjamin E. R.
Ree, Richard H.
Moreau, Corrie S.
spellingShingle Rubin, Benjamin E. R.
Ree, Richard H.
Moreau, Corrie S.
Inferring Phylogenies from RAD Sequence Data
author_facet Rubin, Benjamin E. R.
Ree, Richard H.
Moreau, Corrie S.
author_sort Rubin, Benjamin E. R.
title Inferring Phylogenies from RAD Sequence Data
title_short Inferring Phylogenies from RAD Sequence Data
title_full Inferring Phylogenies from RAD Sequence Data
title_fullStr Inferring Phylogenies from RAD Sequence Data
title_full_unstemmed Inferring Phylogenies from RAD Sequence Data
title_sort inferring phylogenies from rad sequence data
description Reduced-representation genome sequencing represents a new source of data for systematics, and its potential utility in interspecific phylogeny reconstruction has not yet been explored. One approach that seems especially promising is the use of inexpensive short-read technologies (e.g., Illumina, SOLiD) to sequence restriction-site associated DNA (RAD) – the regions of the genome that flank the recognition sites of restriction enzymes. In this study, we simulated the collection of RAD sequences from sequenced genomes of different taxa (Drosophila, mammals, and yeasts) and developed a proof-of-concept workflow to test whether informative data could be extracted and used to accurately reconstruct “known” phylogenies of species within each group. The workflow consists of three basic steps: first, sequences are clustered by similarity to estimate orthology; second, clusters are filtered by taxonomic coverage; and third, they are aligned and concatenated for “total evidence” phylogenetic analysis. We evaluated the performance of clustering and filtering parameters by comparing the resulting topologies with well-supported reference trees and we were able to identify conditions under which the reference tree was inferred with high support. For Drosophila, whole genome alignments allowed us to directly evaluate which parameters most consistently recovered orthologous sequences. For the parameter ranges explored, we recovered the best results at the low ends of sequence similarity and taxonomic representation of loci; these generated the largest supermatrices with the highest proportion of missing data. Applications of the method to mammals and yeasts were less successful, which we suggest may be due partly to their much deeper evolutionary divergence times compared to Drosophila (crown ages of approximately 100 and 300 versus 60 Mya, respectively). RAD sequences thus appear to hold promise for reconstructing phylogenetic relationships in younger clades in which sufficient numbers of orthologous restriction sites are retained across species.
publisher Public Library of Science
publishDate 2012
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3320897/
_version_ 1611519367049117696