Identifying differential transcription factor binding in ChIP-seq

ChIP seq is a widely used assay to measure genome-wide protein binding. The decrease in costs associated with sequencing has led to a rise in the number of studies that investigate protein binding across treatment conditions or cell lines. In addition to the identification of binding sites, new stud...

Full description

Bibliographic Details
Main Authors: Wu, Dai-Ying, Bittencourt, Danielle, Stallcup, Michael R., Siegmund, Kimberly D.
Format: Online
Language:English
Published: Frontiers Media S.A. 2015
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4413818/
id pubmed-4413818
recordtype oai_dc
spelling pubmed-44138182015-05-13 Identifying differential transcription factor binding in ChIP-seq Wu, Dai-Ying Bittencourt, Danielle Stallcup, Michael R. Siegmund, Kimberly D. Genetics ChIP seq is a widely used assay to measure genome-wide protein binding. The decrease in costs associated with sequencing has led to a rise in the number of studies that investigate protein binding across treatment conditions or cell lines. In addition to the identification of binding sites, new studies evaluate the variation in protein binding between conditions. A number of approaches to study differential transcription factor binding have recently been developed. Several of these methods build upon established methods from RNA-seq to quantify differences in read counts. We compare how these new approaches perform on different data sets from the ENCODE project to illustrate the impact of data processing pipelines under different study designs. The performance of normalization methods for differential ChIP-seq depends strongly on the variation in total amount of protein bound between conditions, with total read count outperforming effective library size, or variants thereof, when a large variation in binding was studied. Use of input subtraction to correct for non-specific binding showed a relatively modest impact on the number of differential peaks found and the fold change accuracy to biological validation, however a larger impact might be expected for samples with more extreme copy number variations between them. Still, it did identify a small subset of novel differential regions while excluding some differential peaks in regions with high background signal. These results highlight proper scaling for between-sample data normalization as critical for differential transcription factor binding analysis and suggest bioinformaticians need to know about the variation in level of total protein binding between conditions to select the best analysis method. At the same time, validation using fold-change estimates from qRT-PCR suggests there is still room for further method improvement. Frontiers Media S.A. 2015-04-29 /pmc/articles/PMC4413818/ /pubmed/25972895 http://dx.doi.org/10.3389/fgene.2015.00169 Text en Copyright © 2015 Wu, Bittencourt, Stallcup and Siegmund. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
repository_type Open Access Journal
institution_category Foreign Institution
institution US National Center for Biotechnology Information
building NCBI PubMed
collection Online Access
language English
format Online
author Wu, Dai-Ying
Bittencourt, Danielle
Stallcup, Michael R.
Siegmund, Kimberly D.
spellingShingle Wu, Dai-Ying
Bittencourt, Danielle
Stallcup, Michael R.
Siegmund, Kimberly D.
Identifying differential transcription factor binding in ChIP-seq
author_facet Wu, Dai-Ying
Bittencourt, Danielle
Stallcup, Michael R.
Siegmund, Kimberly D.
author_sort Wu, Dai-Ying
title Identifying differential transcription factor binding in ChIP-seq
title_short Identifying differential transcription factor binding in ChIP-seq
title_full Identifying differential transcription factor binding in ChIP-seq
title_fullStr Identifying differential transcription factor binding in ChIP-seq
title_full_unstemmed Identifying differential transcription factor binding in ChIP-seq
title_sort identifying differential transcription factor binding in chip-seq
description ChIP seq is a widely used assay to measure genome-wide protein binding. The decrease in costs associated with sequencing has led to a rise in the number of studies that investigate protein binding across treatment conditions or cell lines. In addition to the identification of binding sites, new studies evaluate the variation in protein binding between conditions. A number of approaches to study differential transcription factor binding have recently been developed. Several of these methods build upon established methods from RNA-seq to quantify differences in read counts. We compare how these new approaches perform on different data sets from the ENCODE project to illustrate the impact of data processing pipelines under different study designs. The performance of normalization methods for differential ChIP-seq depends strongly on the variation in total amount of protein bound between conditions, with total read count outperforming effective library size, or variants thereof, when a large variation in binding was studied. Use of input subtraction to correct for non-specific binding showed a relatively modest impact on the number of differential peaks found and the fold change accuracy to biological validation, however a larger impact might be expected for samples with more extreme copy number variations between them. Still, it did identify a small subset of novel differential regions while excluding some differential peaks in regions with high background signal. These results highlight proper scaling for between-sample data normalization as critical for differential transcription factor binding analysis and suggest bioinformaticians need to know about the variation in level of total protein binding between conditions to select the best analysis method. At the same time, validation using fold-change estimates from qRT-PCR suggests there is still room for further method improvement.
publisher Frontiers Media S.A.
publishDate 2015
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4413818/
_version_ 1613217191530856448