Automated and Accurate Estimation of Gene Family Abundance from Shotgun Metagenomes

Shotgun metagenomic DNA sequencing is a widely applicable tool for characterizing the functions that are encoded by microbial communities. Several bioinformatic tools can be used to functionally annotate metagenomes, allowing researchers to draw inferences about the functional potential of the commu...

Full description

Bibliographic Details
Main Authors: Nayfach, Stephen, Bradley, Patrick H., Wyman, Stacia K., Laurent, Timothy J., Williams, Alex, Eisen, Jonathan A., Pollard, Katherine S., Sharpton, Thomas J.
Format: Online
Language:English
Published: Public Library of Science 2015
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4643905/
id pubmed-4643905
recordtype oai_dc
spelling pubmed-46439052015-11-18 Automated and Accurate Estimation of Gene Family Abundance from Shotgun Metagenomes Nayfach, Stephen Bradley, Patrick H. Wyman, Stacia K. Laurent, Timothy J. Williams, Alex Eisen, Jonathan A. Pollard, Katherine S. Sharpton, Thomas J. Research Article Shotgun metagenomic DNA sequencing is a widely applicable tool for characterizing the functions that are encoded by microbial communities. Several bioinformatic tools can be used to functionally annotate metagenomes, allowing researchers to draw inferences about the functional potential of the community and to identify putative functional biomarkers. However, little is known about how decisions made during annotation affect the reliability of the results. Here, we use statistical simulations to rigorously assess how to optimize annotation accuracy and speed, given parameters of the input data like read length and library size. We identify best practices in metagenome annotation and use them to guide the development of the Shotgun Metagenome Annotation Pipeline (ShotMAP). ShotMAP is an analytically flexible, end-to-end annotation pipeline that can be implemented either on a local computer or a cloud compute cluster. We use ShotMAP to assess how different annotation databases impact the interpretation of how marine metagenome and metatranscriptome functional capacity changes across seasons. We also apply ShotMAP to data obtained from a clinical microbiome investigation of inflammatory bowel disease. This analysis finds that gut microbiota collected from Crohn’s disease patients are functionally distinct from gut microbiota collected from either ulcerative colitis patients or healthy controls, with differential abundance of metabolic pathways related to host-microbiome interactions that may serve as putative biomarkers of disease. Public Library of Science 2015-11-13 /pmc/articles/PMC4643905/ /pubmed/26565399 http://dx.doi.org/10.1371/journal.pcbi.1004573 Text en © 2015 Nayfach et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
repository_type Open Access Journal
institution_category Foreign Institution
institution US National Center for Biotechnology Information
building NCBI PubMed
collection Online Access
language English
format Online
author Nayfach, Stephen
Bradley, Patrick H.
Wyman, Stacia K.
Laurent, Timothy J.
Williams, Alex
Eisen, Jonathan A.
Pollard, Katherine S.
Sharpton, Thomas J.
spellingShingle Nayfach, Stephen
Bradley, Patrick H.
Wyman, Stacia K.
Laurent, Timothy J.
Williams, Alex
Eisen, Jonathan A.
Pollard, Katherine S.
Sharpton, Thomas J.
Automated and Accurate Estimation of Gene Family Abundance from Shotgun Metagenomes
author_facet Nayfach, Stephen
Bradley, Patrick H.
Wyman, Stacia K.
Laurent, Timothy J.
Williams, Alex
Eisen, Jonathan A.
Pollard, Katherine S.
Sharpton, Thomas J.
author_sort Nayfach, Stephen
title Automated and Accurate Estimation of Gene Family Abundance from Shotgun Metagenomes
title_short Automated and Accurate Estimation of Gene Family Abundance from Shotgun Metagenomes
title_full Automated and Accurate Estimation of Gene Family Abundance from Shotgun Metagenomes
title_fullStr Automated and Accurate Estimation of Gene Family Abundance from Shotgun Metagenomes
title_full_unstemmed Automated and Accurate Estimation of Gene Family Abundance from Shotgun Metagenomes
title_sort automated and accurate estimation of gene family abundance from shotgun metagenomes
description Shotgun metagenomic DNA sequencing is a widely applicable tool for characterizing the functions that are encoded by microbial communities. Several bioinformatic tools can be used to functionally annotate metagenomes, allowing researchers to draw inferences about the functional potential of the community and to identify putative functional biomarkers. However, little is known about how decisions made during annotation affect the reliability of the results. Here, we use statistical simulations to rigorously assess how to optimize annotation accuracy and speed, given parameters of the input data like read length and library size. We identify best practices in metagenome annotation and use them to guide the development of the Shotgun Metagenome Annotation Pipeline (ShotMAP). ShotMAP is an analytically flexible, end-to-end annotation pipeline that can be implemented either on a local computer or a cloud compute cluster. We use ShotMAP to assess how different annotation databases impact the interpretation of how marine metagenome and metatranscriptome functional capacity changes across seasons. We also apply ShotMAP to data obtained from a clinical microbiome investigation of inflammatory bowel disease. This analysis finds that gut microbiota collected from Crohn’s disease patients are functionally distinct from gut microbiota collected from either ulcerative colitis patients or healthy controls, with differential abundance of metabolic pathways related to host-microbiome interactions that may serve as putative biomarkers of disease.
publisher Public Library of Science
publishDate 2015
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4643905/
_version_ 1613500860325691392