Differentially expressed genes from RNA-Seq and functional enrichment results are affected by the choice of single-end versus paired-end reads and stranded versus non-stranded protocols

Abstract Background RNA-Seq is now widely used as a research tool. Choices must be made whether to use paired-end (PE) or single-end (SE) sequencing, and whether to use strand-specific or non-specific (NS) library preparation kits. To date there has been no analysis of the effect of these choices on...

Full description

Bibliographic Details
Main Authors: Susan M. Corley, Karen L. MacKenzie, Annemiek Beverdam, Louise F. Roddam, Marc R. Wilkins
Format: Article
Language:English
Published: BioMed Central 2017-05-01
Series:BMC Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12864-017-3797-0
id doaj-art-e3622af371af4412aacec3d047c7a3ea
recordtype oai_dc
spelling doaj-art-e3622af371af4412aacec3d047c7a3ea2018-08-16T00:53:20ZengBioMed CentralBMC Genomics1471-21642017-05-0118111310.1186/s12864-017-3797-0Differentially expressed genes from RNA-Seq and functional enrichment results are affected by the choice of single-end versus paired-end reads and stranded versus non-stranded protocolsSusan M. Corley0Karen L. MacKenzie1Annemiek Beverdam2Louise F. Roddam3Marc R. Wilkins4Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, UNSW AustraliaChildren’s Cancer Institute Australia, Kensington New South WalesSchool of Medical Sciences, UNSW AustraliaSchool of Medicine, University of TasmaniaSystems Biology Initiative, School of Biotechnology and Biomolecular Sciences, UNSW AustraliaAbstract Background RNA-Seq is now widely used as a research tool. Choices must be made whether to use paired-end (PE) or single-end (SE) sequencing, and whether to use strand-specific or non-specific (NS) library preparation kits. To date there has been no analysis of the effect of these choices on identifying differentially expressed genes (DEGs) between controls and treated samples and on downstream functional analysis. Results We undertook four mammalian transcriptomics experiments to compare the effect of SE and PE protocols on read mapping, feature counting, identification of DEGs and functional analysis. For three of these experiments we also compared a non-stranded (NS) and a strand-specific approach to mapping the paired-end data. SE mapping resulted in a reduced number of reads mapped to features, in all four experiments, and lower read count per gene. Up to 4.3% of genes in the SE data and up to 12.3% of genes in the NS data had read counts which were significantly different compared to the PE data. Comparison of DEGs showed the presence of false positives (average 5%, using voom) and false negatives (average 5%, using voom) using the SE reads. These increased further, by one or two percentage points, with the NS data. Gene ontology functional enrichment (GO) of the DEGs arising from SE or NS approaches, revealed striking differences in the top 20 GO terms, with as little as 40% concordance with PE results. Caution is therefore advised in the interpretation of such results. By comparison, there was overall consistency in gene set enrichment analysis results. Conclusions A strand-specific protocol should be used in library preparation to generate the most reliable and accurate profile of expression. Ideally PE reads are also recommended particularly for transcriptome assembly. Whilst SE reads produce a DEG list with around 5% of false positives and false negatives, this method can substantially reduce sequencing cost and this saving could be used to increase the number of biological replicates thereby increasing the power of the experiment. As SE reads, when used in association with gene set enrichment, can generate accurate biological results, this may be a desirable trade-off.http://link.springer.com/article/10.1186/s12864-017-3797-0RNA-SeqTranscriptomicsPaired-end readsSingle-end readsDifferential expressionStrand-specific
institution Open Data Bank
collection Open Access Journals
building Directory of Open Access Journals
language English
format Article
author Susan M. Corley
Karen L. MacKenzie
Annemiek Beverdam
Louise F. Roddam
Marc R. Wilkins
spellingShingle Susan M. Corley
Karen L. MacKenzie
Annemiek Beverdam
Louise F. Roddam
Marc R. Wilkins
Differentially expressed genes from RNA-Seq and functional enrichment results are affected by the choice of single-end versus paired-end reads and stranded versus non-stranded protocols
BMC Genomics
RNA-Seq
Transcriptomics
Paired-end reads
Single-end reads
Differential expression
Strand-specific
author_facet Susan M. Corley
Karen L. MacKenzie
Annemiek Beverdam
Louise F. Roddam
Marc R. Wilkins
author_sort Susan M. Corley
title Differentially expressed genes from RNA-Seq and functional enrichment results are affected by the choice of single-end versus paired-end reads and stranded versus non-stranded protocols
title_short Differentially expressed genes from RNA-Seq and functional enrichment results are affected by the choice of single-end versus paired-end reads and stranded versus non-stranded protocols
title_full Differentially expressed genes from RNA-Seq and functional enrichment results are affected by the choice of single-end versus paired-end reads and stranded versus non-stranded protocols
title_fullStr Differentially expressed genes from RNA-Seq and functional enrichment results are affected by the choice of single-end versus paired-end reads and stranded versus non-stranded protocols
title_full_unstemmed Differentially expressed genes from RNA-Seq and functional enrichment results are affected by the choice of single-end versus paired-end reads and stranded versus non-stranded protocols
title_sort differentially expressed genes from rna-seq and functional enrichment results are affected by the choice of single-end versus paired-end reads and stranded versus non-stranded protocols
publisher BioMed Central
series BMC Genomics
issn 1471-2164
publishDate 2017-05-01
description Abstract Background RNA-Seq is now widely used as a research tool. Choices must be made whether to use paired-end (PE) or single-end (SE) sequencing, and whether to use strand-specific or non-specific (NS) library preparation kits. To date there has been no analysis of the effect of these choices on identifying differentially expressed genes (DEGs) between controls and treated samples and on downstream functional analysis. Results We undertook four mammalian transcriptomics experiments to compare the effect of SE and PE protocols on read mapping, feature counting, identification of DEGs and functional analysis. For three of these experiments we also compared a non-stranded (NS) and a strand-specific approach to mapping the paired-end data. SE mapping resulted in a reduced number of reads mapped to features, in all four experiments, and lower read count per gene. Up to 4.3% of genes in the SE data and up to 12.3% of genes in the NS data had read counts which were significantly different compared to the PE data. Comparison of DEGs showed the presence of false positives (average 5%, using voom) and false negatives (average 5%, using voom) using the SE reads. These increased further, by one or two percentage points, with the NS data. Gene ontology functional enrichment (GO) of the DEGs arising from SE or NS approaches, revealed striking differences in the top 20 GO terms, with as little as 40% concordance with PE results. Caution is therefore advised in the interpretation of such results. By comparison, there was overall consistency in gene set enrichment analysis results. Conclusions A strand-specific protocol should be used in library preparation to generate the most reliable and accurate profile of expression. Ideally PE reads are also recommended particularly for transcriptome assembly. Whilst SE reads produce a DEG list with around 5% of false positives and false negatives, this method can substantially reduce sequencing cost and this saving could be used to increase the number of biological replicates thereby increasing the power of the experiment. As SE reads, when used in association with gene set enrichment, can generate accurate biological results, this may be a desirable trade-off.
topic RNA-Seq
Transcriptomics
Paired-end reads
Single-end reads
Differential expression
Strand-specific
url http://link.springer.com/article/10.1186/s12864-017-3797-0
_version_ 1612696374289104896