enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning

DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have...

Full description

Bibliographic Details
Main Authors: Xu, Ruifeng, Zhou, Jiyun, Liu, Bin, Yao, Lin, He, Yulan, Zou, Quan, Wang, Xiaolong
Format: Online
Language:English
Published: Hindawi Publishing Corporation 2014
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058174/
id pubmed-4058174
recordtype oai_dc
spelling pubmed-40581742014-06-29 enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning Xu, Ruifeng Zhou, Jiyun Liu, Bin Yao, Lin He, Yulan Zou, Quan Wang, Xiaolong Research Article DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97–9.52% in ACC and 0.08–0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83–16.63% in terms of ACC and 0.02–0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public. Hindawi Publishing Corporation 2014 2014-05-26 /pmc/articles/PMC4058174/ /pubmed/24977146 http://dx.doi.org/10.1155/2014/294279 Text en Copyright © 2014 Ruifeng Xu et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
repository_type Open Access Journal
institution_category Foreign Institution
institution US National Center for Biotechnology Information
building NCBI PubMed
collection Online Access
language English
format Online
author Xu, Ruifeng
Zhou, Jiyun
Liu, Bin
Yao, Lin
He, Yulan
Zou, Quan
Wang, Xiaolong
spellingShingle Xu, Ruifeng
Zhou, Jiyun
Liu, Bin
Yao, Lin
He, Yulan
Zou, Quan
Wang, Xiaolong
enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning
author_facet Xu, Ruifeng
Zhou, Jiyun
Liu, Bin
Yao, Lin
He, Yulan
Zou, Quan
Wang, Xiaolong
author_sort Xu, Ruifeng
title enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning
title_short enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning
title_full enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning
title_fullStr enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning
title_full_unstemmed enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning
title_sort endna-prot: identification of dna-binding proteins by applying ensemble learning
description DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97–9.52% in ACC and 0.08–0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83–16.63% in terms of ACC and 0.02–0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public.
publisher Hindawi Publishing Corporation
publishDate 2014
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058174/
_version_ 1612101589722464256