Identifying differential transcription factor binding in ChIP-seq
ChIP seq is a widely used assay to measure genome-wide protein binding. The decrease in costs associated with sequencing has led to a rise in the number of studies that investigate protein binding across treatment conditions or cell lines. In addition to the identification of binding sites, new stud...
Main Authors: | , , , |
---|---|
Format: | Online |
Language: | English |
Published: |
Frontiers Media S.A.
2015
|
Online Access: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4413818/ |
id |
pubmed-4413818 |
---|---|
recordtype |
oai_dc |
spelling |
pubmed-44138182015-05-13 Identifying differential transcription factor binding in ChIP-seq Wu, Dai-Ying Bittencourt, Danielle Stallcup, Michael R. Siegmund, Kimberly D. Genetics ChIP seq is a widely used assay to measure genome-wide protein binding. The decrease in costs associated with sequencing has led to a rise in the number of studies that investigate protein binding across treatment conditions or cell lines. In addition to the identification of binding sites, new studies evaluate the variation in protein binding between conditions. A number of approaches to study differential transcription factor binding have recently been developed. Several of these methods build upon established methods from RNA-seq to quantify differences in read counts. We compare how these new approaches perform on different data sets from the ENCODE project to illustrate the impact of data processing pipelines under different study designs. The performance of normalization methods for differential ChIP-seq depends strongly on the variation in total amount of protein bound between conditions, with total read count outperforming effective library size, or variants thereof, when a large variation in binding was studied. Use of input subtraction to correct for non-specific binding showed a relatively modest impact on the number of differential peaks found and the fold change accuracy to biological validation, however a larger impact might be expected for samples with more extreme copy number variations between them. Still, it did identify a small subset of novel differential regions while excluding some differential peaks in regions with high background signal. These results highlight proper scaling for between-sample data normalization as critical for differential transcription factor binding analysis and suggest bioinformaticians need to know about the variation in level of total protein binding between conditions to select the best analysis method. At the same time, validation using fold-change estimates from qRT-PCR suggests there is still room for further method improvement. Frontiers Media S.A. 2015-04-29 /pmc/articles/PMC4413818/ /pubmed/25972895 http://dx.doi.org/10.3389/fgene.2015.00169 Text en Copyright © 2015 Wu, Bittencourt, Stallcup and Siegmund. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
repository_type |
Open Access Journal |
institution_category |
Foreign Institution |
institution |
US National Center for Biotechnology Information |
building |
NCBI PubMed |
collection |
Online Access |
language |
English |
format |
Online |
author |
Wu, Dai-Ying Bittencourt, Danielle Stallcup, Michael R. Siegmund, Kimberly D. |
spellingShingle |
Wu, Dai-Ying Bittencourt, Danielle Stallcup, Michael R. Siegmund, Kimberly D. Identifying differential transcription factor binding in ChIP-seq |
author_facet |
Wu, Dai-Ying Bittencourt, Danielle Stallcup, Michael R. Siegmund, Kimberly D. |
author_sort |
Wu, Dai-Ying |
title |
Identifying differential transcription factor binding in ChIP-seq |
title_short |
Identifying differential transcription factor binding in ChIP-seq |
title_full |
Identifying differential transcription factor binding in ChIP-seq |
title_fullStr |
Identifying differential transcription factor binding in ChIP-seq |
title_full_unstemmed |
Identifying differential transcription factor binding in ChIP-seq |
title_sort |
identifying differential transcription factor binding in chip-seq |
description |
ChIP seq is a widely used assay to measure genome-wide protein binding. The decrease in costs associated with sequencing has led to a rise in the number of studies that investigate protein binding across treatment conditions or cell lines. In addition to the identification of binding sites, new studies evaluate the variation in protein binding between conditions. A number of approaches to study differential transcription factor binding have recently been developed. Several of these methods build upon established methods from RNA-seq to quantify differences in read counts. We compare how these new approaches perform on different data sets from the ENCODE project to illustrate the impact of data processing pipelines under different study designs. The performance of normalization methods for differential ChIP-seq depends strongly on the variation in total amount of protein bound between conditions, with total read count outperforming effective library size, or variants thereof, when a large variation in binding was studied. Use of input subtraction to correct for non-specific binding showed a relatively modest impact on the number of differential peaks found and the fold change accuracy to biological validation, however a larger impact might be expected for samples with more extreme copy number variations between them. Still, it did identify a small subset of novel differential regions while excluding some differential peaks in regions with high background signal. These results highlight proper scaling for between-sample data normalization as critical for differential transcription factor binding analysis and suggest bioinformaticians need to know about the variation in level of total protein binding between conditions to select the best analysis method. At the same time, validation using fold-change estimates from qRT-PCR suggests there is still room for further method improvement. |
publisher |
Frontiers Media S.A. |
publishDate |
2015 |
url |
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4413818/ |
_version_ |
1613217191530856448 |