Protein sequences classification based on weighting scheme

We present a new technique to recognize remote protein homologies that rely on combining probabilistic modeling and supervised learning in high-dimensional feature spaces. The main novelty of our technique is the method of constructing feature vectors using Hidden Markov Model and the combination of...

Full description

Bibliographic Details
Main Authors: Zaki, N. M., Deris, Safaai, Md Illias, Rosli
Format: Article
Language:English
Published: Assumption University 2005
Subjects:
Online Access:http://eprints.utm.my/5576/
http://eprints.utm.my/5576/1/N.M.Zaki2005_ProteinSequencesClassificationBasedOn.pdf
_version_ 1848891087194685440
author Zaki, N. M.
Deris, Safaai
Md Illias, Rosli
author_facet Zaki, N. M.
Deris, Safaai
Md Illias, Rosli
author_sort Zaki, N. M.
building UTeM Institutional Repository
collection Online Access
description We present a new technique to recognize remote protein homologies that rely on combining probabilistic modeling and supervised learning in high-dimensional feature spaces. The main novelty of our technique is the method of constructing feature vectors using Hidden Markov Model and the combination of this representation with a classifier capable of learning in very sparse high-dimensional spaces. Each feature vector records the sensitivity of each protein domain to a previously learned set of sub-sequences (strings). Unlike other previous methods, our method takes in consideration the conserved and non-conserved regions. The system subsequently utilizes Support Vector Machines (SVM) classifiers to learn the boundaries between structural protein classes. Experiments show that this method, which we call the String Weighting Scheme-SVM (SWS-SVM) method, significantly improves on previous methods for the classification of protein domains based on remote homologies. Our method is then compared to five existing homology detection methods.
first_indexed 2025-11-15T20:52:23Z
format Article
id utm-5576
institution Universiti Teknologi Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T20:52:23Z
publishDate 2005
publisher Assumption University
recordtype eprints
repository_type Digital Repository
spelling utm-55762010-06-01T15:32:30Z http://eprints.utm.my/5576/ Protein sequences classification based on weighting scheme Zaki, N. M. Deris, Safaai Md Illias, Rosli T Technology (General) We present a new technique to recognize remote protein homologies that rely on combining probabilistic modeling and supervised learning in high-dimensional feature spaces. The main novelty of our technique is the method of constructing feature vectors using Hidden Markov Model and the combination of this representation with a classifier capable of learning in very sparse high-dimensional spaces. Each feature vector records the sensitivity of each protein domain to a previously learned set of sub-sequences (strings). Unlike other previous methods, our method takes in consideration the conserved and non-conserved regions. The system subsequently utilizes Support Vector Machines (SVM) classifiers to learn the boundaries between structural protein classes. Experiments show that this method, which we call the String Weighting Scheme-SVM (SWS-SVM) method, significantly improves on previous methods for the classification of protein domains based on remote homologies. Our method is then compared to five existing homology detection methods. Assumption University 2005 Article PeerReviewed application/pdf en http://eprints.utm.my/5576/1/N.M.Zaki2005_ProteinSequencesClassificationBasedOn.pdf Zaki, N. M. and Deris, Safaai and Md Illias, Rosli (2005) Protein sequences classification based on weighting scheme. International Journal of Computer, the Internet and Management, 13 (1). pp. 50-60.
spellingShingle T Technology (General)
Zaki, N. M.
Deris, Safaai
Md Illias, Rosli
Protein sequences classification based on weighting scheme
title Protein sequences classification based on weighting scheme
title_full Protein sequences classification based on weighting scheme
title_fullStr Protein sequences classification based on weighting scheme
title_full_unstemmed Protein sequences classification based on weighting scheme
title_short Protein sequences classification based on weighting scheme
title_sort protein sequences classification based on weighting scheme
topic T Technology (General)
url http://eprints.utm.my/5576/
http://eprints.utm.my/5576/1/N.M.Zaki2005_ProteinSequencesClassificationBasedOn.pdf