eDNAFlow, an automated, reproducible and scalable workflow for analysis of environmental DNA (eDNA) sequences exploiting Nextflow and Singularity

Metabarcoding of environmental DNA (eDNA) when coupled with high throughput sequencing is revolutionising the way biodiversity can be monitored across a wide range of applications. However, the large number of tools deployed in downstream bioinformatic analyses often places a challenge in configurat...

Full description

Bibliographic Details
Main Authors: Mousaviderazmahalleh, Mahsa Mousavi, Stott, Audrey, Lines, Rose, Peverley, Georgia, Nester, Georgia, Simpson, Tiffany, Zawierta, Michal, De La Pierre, Marco, Bunce, Michael, Christophersen, Claus
Format: Journal Article
Language:English
Published: Wiley-Blackwell 2021
Subjects:
Online Access:http://hdl.handle.net/20.500.11937/86511
_version_ 1848764841701933056
author Mousaviderazmahalleh, Mahsa Mousavi
Stott, Audrey
Lines, Rose
Peverley, Georgia
Nester, Georgia
Simpson, Tiffany
Zawierta, Michal
De La Pierre, Marco
Bunce, Michael
Christophersen, Claus
author_facet Mousaviderazmahalleh, Mahsa Mousavi
Stott, Audrey
Lines, Rose
Peverley, Georgia
Nester, Georgia
Simpson, Tiffany
Zawierta, Michal
De La Pierre, Marco
Bunce, Michael
Christophersen, Claus
author_sort Mousaviderazmahalleh, Mahsa Mousavi
building Curtin Institutional Repository
collection Online Access
description Metabarcoding of environmental DNA (eDNA) when coupled with high throughput sequencing is revolutionising the way biodiversity can be monitored across a wide range of applications. However, the large number of tools deployed in downstream bioinformatic analyses often places a challenge in configuration and maintenance of a workflow, and consequently limits the research reproducibility. Furthermore, scalability needs to be considered to handle the growing amount of data due to increase in sequence output and the scale of project. Here, we describe eDNAFlow, a fully automated workflow that employs a number of state-of-the-art applications to process eDNA data from raw sequences (single-end or paired-end) to generation of curated and noncurated zero-radius operational taxonomic units (ZOTUs) and their abundance tables. This pipeline is based on Nextflow and Singularity which enable a scalable, portable and reproducible workflow using software containers on a local computer, clouds and high-performance computing (HPC) clusters. Finally, we present an in-house Python script to assign taxonomy to ZOTUs based on user specified thresholds for assigning lowest common ancestor (LCA). We demonstrate the utility and efficiency of the pipeline using an example of a published coral diversity biomonitoring study. Our results were congruent with the aforementioned study. The scalability of the pipeline is also demonstrated through analysis of a large data set containing 154 samples. To our knowledge, this is the first automated bioinformatic pipeline for eDNA analysis using two powerful tools: Nextflow and Singularity. This pipeline addresses two major challenges in the analysis of eDNA data; scalability and reproducibility.
first_indexed 2025-11-14T11:25:46Z
format Journal Article
id curtin-20.500.11937-86511
institution Curtin University Malaysia
institution_category Local University
language English
last_indexed 2025-11-14T11:25:46Z
publishDate 2021
publisher Wiley-Blackwell
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-865112021-11-29T05:24:20Z eDNAFlow, an automated, reproducible and scalable workflow for analysis of environmental DNA (eDNA) sequences exploiting Nextflow and Singularity Mousaviderazmahalleh, Mahsa Mousavi Stott, Audrey Lines, Rose Peverley, Georgia Nester, Georgia Simpson, Tiffany Zawierta, Michal De La Pierre, Marco Bunce, Michael Christophersen, Claus Science & Technology Life Sciences & Biomedicine Biochemistry & Molecular Biology Ecology Evolutionary Biology Environmental Sciences & Ecology environmental DNA metabarcoding Nextflow Singularity Metabarcoding of environmental DNA (eDNA) when coupled with high throughput sequencing is revolutionising the way biodiversity can be monitored across a wide range of applications. However, the large number of tools deployed in downstream bioinformatic analyses often places a challenge in configuration and maintenance of a workflow, and consequently limits the research reproducibility. Furthermore, scalability needs to be considered to handle the growing amount of data due to increase in sequence output and the scale of project. Here, we describe eDNAFlow, a fully automated workflow that employs a number of state-of-the-art applications to process eDNA data from raw sequences (single-end or paired-end) to generation of curated and noncurated zero-radius operational taxonomic units (ZOTUs) and their abundance tables. This pipeline is based on Nextflow and Singularity which enable a scalable, portable and reproducible workflow using software containers on a local computer, clouds and high-performance computing (HPC) clusters. Finally, we present an in-house Python script to assign taxonomy to ZOTUs based on user specified thresholds for assigning lowest common ancestor (LCA). We demonstrate the utility and efficiency of the pipeline using an example of a published coral diversity biomonitoring study. Our results were congruent with the aforementioned study. The scalability of the pipeline is also demonstrated through analysis of a large data set containing 154 samples. To our knowledge, this is the first automated bioinformatic pipeline for eDNA analysis using two powerful tools: Nextflow and Singularity. This pipeline addresses two major challenges in the analysis of eDNA data; scalability and reproducibility. 2021 Journal Article http://hdl.handle.net/20.500.11937/86511 10.1111/1755-0998.13356 English Wiley-Blackwell restricted
spellingShingle Science & Technology
Life Sciences & Biomedicine
Biochemistry & Molecular Biology
Ecology
Evolutionary Biology
Environmental Sciences & Ecology
environmental DNA
metabarcoding
Nextflow
Singularity
Mousaviderazmahalleh, Mahsa Mousavi
Stott, Audrey
Lines, Rose
Peverley, Georgia
Nester, Georgia
Simpson, Tiffany
Zawierta, Michal
De La Pierre, Marco
Bunce, Michael
Christophersen, Claus
eDNAFlow, an automated, reproducible and scalable workflow for analysis of environmental DNA (eDNA) sequences exploiting Nextflow and Singularity
title eDNAFlow, an automated, reproducible and scalable workflow for analysis of environmental DNA (eDNA) sequences exploiting Nextflow and Singularity
title_full eDNAFlow, an automated, reproducible and scalable workflow for analysis of environmental DNA (eDNA) sequences exploiting Nextflow and Singularity
title_fullStr eDNAFlow, an automated, reproducible and scalable workflow for analysis of environmental DNA (eDNA) sequences exploiting Nextflow and Singularity
title_full_unstemmed eDNAFlow, an automated, reproducible and scalable workflow for analysis of environmental DNA (eDNA) sequences exploiting Nextflow and Singularity
title_short eDNAFlow, an automated, reproducible and scalable workflow for analysis of environmental DNA (eDNA) sequences exploiting Nextflow and Singularity
title_sort ednaflow, an automated, reproducible and scalable workflow for analysis of environmental dna (edna) sequences exploiting nextflow and singularity
topic Science & Technology
Life Sciences & Biomedicine
Biochemistry & Molecular Biology
Ecology
Evolutionary Biology
Environmental Sciences & Ecology
environmental DNA
metabarcoding
Nextflow
Singularity
url http://hdl.handle.net/20.500.11937/86511