Translating nucleic acid binding protein function from model species to minor crops using transfer learning

Genomic elements such as proteins or genes are the basic unit of the genome and involved in the functioning of every biological process. Predicting, therefore, the function of these genomic elements is the first step in the understanding of functioning of plants under various stress conditions. To d...

Full description

Bibliographic Details
Main Author: Bonthala, Venkata Suresh
Format: Thesis (University of Nottingham only)
Language:English
Published: 2018
Subjects:
Online Access:https://eprints.nottingham.ac.uk/52289/
_version_ 1848798691467460608
author Bonthala, Venkata Suresh
author_facet Bonthala, Venkata Suresh
author_sort Bonthala, Venkata Suresh
building Nottingham Research Data Repository
collection Online Access
description Genomic elements such as proteins or genes are the basic unit of the genome and involved in the functioning of every biological process. Predicting, therefore, the function of these genomic elements is the first step in the understanding of functioning of plants under various stress conditions. To date, various types of computational methods have been developed to predict the function of a given protein sequence. The recent increase in the development of a number of methods has created its own set of problems leading to difficulty in applying on newly sequenced genomes especially non-model crops. Due to these reasons, the immediate requirement for development of sophisticated computational methods to predict the function of a given protein sequence is raised. This thesis presents three novel computational tools developed based on transfer learning algorithms to predict the function of a given protein sequence and these tools are: 1) TL-RBPPred, for prediction of RNA-binding proteins, outperformed SPOT-Seq, RNApred, RBPPred and BLASTp on HumanSet (AUC of 0.977), YeastSet (AUC of 0.971), ArabidopsisSet (AUC of 0.972) and GlymaxSet (AUC of 0.97); 2) TL-DBPPred, for prediction of DNA-binding proteins, outperformed DNABP, enDNA-Prot, iDNA-Prot, nDNAProt, iDNA-Prot|Dis, DNAbinder and BLASTp on an testing dataset (AUC of 0.988); and 3) TL-TFPred, for prediction of transcription factors, outperformed PlantTFcat, iTAK and BLASTp on testing dataset (AUC of 0.999) in terms of prediction accuracy. Further, both TL-RBPPred and TL-DBPPred were tested on the transcriptome of the non-model crop, Bambara groundnut (Vigna subterranea (L.) Verdc.), to identify RNA-binding and DNA-binding proteins, respectively. The results obtained from these tests indicated that these two methods outperformed in terms of prediction accuracy (AUC) as compared to existing current state-of-the art tools such as SPOT-Seq, RBPPred, iDNA-Prot and iDNA-Prot|Dis. Based on the performance, the developed methods will be useful in predicting the function of given protein sequences (DNA, RNA-binding and transcription factor) of model species as well as non-model crops.
first_indexed 2025-11-14T20:23:48Z
format Thesis (University of Nottingham only)
id nottingham-52289
institution University of Nottingham Malaysia Campus
institution_category Local University
language English
last_indexed 2025-11-14T20:23:48Z
publishDate 2018
recordtype eprints
repository_type Digital Repository
spelling nottingham-522892025-02-28T14:09:49Z https://eprints.nottingham.ac.uk/52289/ Translating nucleic acid binding protein function from model species to minor crops using transfer learning Bonthala, Venkata Suresh Genomic elements such as proteins or genes are the basic unit of the genome and involved in the functioning of every biological process. Predicting, therefore, the function of these genomic elements is the first step in the understanding of functioning of plants under various stress conditions. To date, various types of computational methods have been developed to predict the function of a given protein sequence. The recent increase in the development of a number of methods has created its own set of problems leading to difficulty in applying on newly sequenced genomes especially non-model crops. Due to these reasons, the immediate requirement for development of sophisticated computational methods to predict the function of a given protein sequence is raised. This thesis presents three novel computational tools developed based on transfer learning algorithms to predict the function of a given protein sequence and these tools are: 1) TL-RBPPred, for prediction of RNA-binding proteins, outperformed SPOT-Seq, RNApred, RBPPred and BLASTp on HumanSet (AUC of 0.977), YeastSet (AUC of 0.971), ArabidopsisSet (AUC of 0.972) and GlymaxSet (AUC of 0.97); 2) TL-DBPPred, for prediction of DNA-binding proteins, outperformed DNABP, enDNA-Prot, iDNA-Prot, nDNAProt, iDNA-Prot|Dis, DNAbinder and BLASTp on an testing dataset (AUC of 0.988); and 3) TL-TFPred, for prediction of transcription factors, outperformed PlantTFcat, iTAK and BLASTp on testing dataset (AUC of 0.999) in terms of prediction accuracy. Further, both TL-RBPPred and TL-DBPPred were tested on the transcriptome of the non-model crop, Bambara groundnut (Vigna subterranea (L.) Verdc.), to identify RNA-binding and DNA-binding proteins, respectively. The results obtained from these tests indicated that these two methods outperformed in terms of prediction accuracy (AUC) as compared to existing current state-of-the art tools such as SPOT-Seq, RBPPred, iDNA-Prot and iDNA-Prot|Dis. Based on the performance, the developed methods will be useful in predicting the function of given protein sequences (DNA, RNA-binding and transcription factor) of model species as well as non-model crops. 2018-07-22 Thesis (University of Nottingham only) NonPeerReviewed application/pdf en arr https://eprints.nottingham.ac.uk/52289/1/Bonthala_Venkata_Suresh_Corrected_Thesis.pdf Bonthala, Venkata Suresh (2018) Translating nucleic acid binding protein function from model species to minor crops using transfer learning. PhD thesis, University of Nottingham. bambara groundnut non-model crops translate protein function transfer learning
spellingShingle bambara groundnut
non-model crops
translate protein function
transfer learning
Bonthala, Venkata Suresh
Translating nucleic acid binding protein function from model species to minor crops using transfer learning
title Translating nucleic acid binding protein function from model species to minor crops using transfer learning
title_full Translating nucleic acid binding protein function from model species to minor crops using transfer learning
title_fullStr Translating nucleic acid binding protein function from model species to minor crops using transfer learning
title_full_unstemmed Translating nucleic acid binding protein function from model species to minor crops using transfer learning
title_short Translating nucleic acid binding protein function from model species to minor crops using transfer learning
title_sort translating nucleic acid binding protein function from model species to minor crops using transfer learning
topic bambara groundnut
non-model crops
translate protein function
transfer learning
url https://eprints.nottingham.ac.uk/52289/