enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning
DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have...
Main Authors: | , , , , , , |
---|---|
Format: | Online |
Language: | English |
Published: |
Hindawi Publishing Corporation
2014
|
Online Access: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058174/ |
id |
pubmed-4058174 |
---|---|
recordtype |
oai_dc |
spelling |
pubmed-40581742014-06-29 enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning Xu, Ruifeng Zhou, Jiyun Liu, Bin Yao, Lin He, Yulan Zou, Quan Wang, Xiaolong Research Article DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97–9.52% in ACC and 0.08–0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83–16.63% in terms of ACC and 0.02–0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public. Hindawi Publishing Corporation 2014 2014-05-26 /pmc/articles/PMC4058174/ /pubmed/24977146 http://dx.doi.org/10.1155/2014/294279 Text en Copyright © 2014 Ruifeng Xu et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
repository_type |
Open Access Journal |
institution_category |
Foreign Institution |
institution |
US National Center for Biotechnology Information |
building |
NCBI PubMed |
collection |
Online Access |
language |
English |
format |
Online |
author |
Xu, Ruifeng Zhou, Jiyun Liu, Bin Yao, Lin He, Yulan Zou, Quan Wang, Xiaolong |
spellingShingle |
Xu, Ruifeng Zhou, Jiyun Liu, Bin Yao, Lin He, Yulan Zou, Quan Wang, Xiaolong enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning |
author_facet |
Xu, Ruifeng Zhou, Jiyun Liu, Bin Yao, Lin He, Yulan Zou, Quan Wang, Xiaolong |
author_sort |
Xu, Ruifeng |
title |
enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning |
title_short |
enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning |
title_full |
enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning |
title_fullStr |
enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning |
title_full_unstemmed |
enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning |
title_sort |
endna-prot: identification of dna-binding proteins by applying ensemble learning |
description |
DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97–9.52% in ACC and 0.08–0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83–16.63% in terms of ACC and 0.02–0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public. |
publisher |
Hindawi Publishing Corporation |
publishDate |
2014 |
url |
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058174/ |
_version_ |
1612101589722464256 |