The development of semantic meta-database: an ontology based semantic integration of biological databases

Protein sequence annotation is important for the preservation and reuse of knowledge, for content-based queries, and for the understanding of its function. Traditional wet-lab methods are labor intensive and prone to human error. Alternatively, existing tools are time intensive and require high inve...

Full description

Bibliographic Details
Main Authors: Samsudin, Ruhaidah, Deris, Safaai, Othman, Muhammad Razib, Md. Illias, Rosli
Format: Monograph
Language:English
Published: Faculty of Computer Science and Information System 2007
Subjects:
Online Access:http://eprints.utm.my/4141/
http://eprints.utm.my/4141/1/74289.pdf
_version_ 1848890729400631296
author Samsudin, Ruhaidah
Deris, Safaai
Othman, Muhammad Razib
Md. Illias, Rosli
author_facet Samsudin, Ruhaidah
Deris, Safaai
Othman, Muhammad Razib
Md. Illias, Rosli
author_sort Samsudin, Ruhaidah
building UTeM Institutional Repository
collection Online Access
description Protein sequence annotation is important for the preservation and reuse of knowledge, for content-based queries, and for the understanding of its function. Traditional wet-lab methods are labor intensive and prone to human error. Alternatively, existing tools are time intensive and require high investment in computing facilities for offline usage. On the other hand, these tools are highly dependent on internet stability and speed for online usage. Therefore, a simple and practical computational method that is more accurate, faster, easy to configure and use, and bears low computing cost is needed particularly for offline usage. In this study, a Gene Ontology (GO) based protein sequence annotation tool named extended UTMGO is developed to meet these features. The GO is selected because of its ability to provide dynamic, precisely defined, structured, and controlled terms that describe genes and their functions and products in any organism. Furthermore, the GO terms are linked with gene products and their protein sequences from various species provided by Gene Ontology Annotation (GOA). Thus, assigning highly correlated GO terms of annotated protein sequences to partially annotated or newly discovered protein sequences can be made. The tool comprises two intelligent algorithms. The first algorithm combines parallel genetic algorithm with the split-and-merge algorithm. The idea is to cluster the GO terms into number k of clusters in order to split the monolithic GO RDF/XML file into smaller files. Thus, it enables protein sequences and Inferred from Electronic Annotation (IEA) evidence associations to be included in those files. The second algorithm incorporates parallel genetic algorithm with the semantic similarity measure algorithm. The motive is to search for a set of semantically similar GO terms from the fragmented GO RDF/XML files to a given query. In addition, its basic version which is a GO browser based on semantic similarity search is also introduced to overcome the weaknesses of conventional approach: the keyword matching.
first_indexed 2025-11-15T20:46:42Z
format Monograph
id utm-4141
institution Universiti Teknologi Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T20:46:42Z
publishDate 2007
publisher Faculty of Computer Science and Information System
recordtype eprints
repository_type Digital Repository
spelling utm-41412010-06-01T03:15:06Z http://eprints.utm.my/4141/ The development of semantic meta-database: an ontology based semantic integration of biological databases Samsudin, Ruhaidah Deris, Safaai Othman, Muhammad Razib Md. Illias, Rosli QA75 Electronic computers. Computer science Protein sequence annotation is important for the preservation and reuse of knowledge, for content-based queries, and for the understanding of its function. Traditional wet-lab methods are labor intensive and prone to human error. Alternatively, existing tools are time intensive and require high investment in computing facilities for offline usage. On the other hand, these tools are highly dependent on internet stability and speed for online usage. Therefore, a simple and practical computational method that is more accurate, faster, easy to configure and use, and bears low computing cost is needed particularly for offline usage. In this study, a Gene Ontology (GO) based protein sequence annotation tool named extended UTMGO is developed to meet these features. The GO is selected because of its ability to provide dynamic, precisely defined, structured, and controlled terms that describe genes and their functions and products in any organism. Furthermore, the GO terms are linked with gene products and their protein sequences from various species provided by Gene Ontology Annotation (GOA). Thus, assigning highly correlated GO terms of annotated protein sequences to partially annotated or newly discovered protein sequences can be made. The tool comprises two intelligent algorithms. The first algorithm combines parallel genetic algorithm with the split-and-merge algorithm. The idea is to cluster the GO terms into number k of clusters in order to split the monolithic GO RDF/XML file into smaller files. Thus, it enables protein sequences and Inferred from Electronic Annotation (IEA) evidence associations to be included in those files. The second algorithm incorporates parallel genetic algorithm with the semantic similarity measure algorithm. The motive is to search for a set of semantically similar GO terms from the fragmented GO RDF/XML files to a given query. In addition, its basic version which is a GO browser based on semantic similarity search is also introduced to overcome the weaknesses of conventional approach: the keyword matching. Faculty of Computer Science and Information System 2007-06-30 Monograph NonPeerReviewed application/pdf en http://eprints.utm.my/4141/1/74289.pdf Samsudin, Ruhaidah and Deris, Safaai and Othman, Muhammad Razib and Md. Illias, Rosli (2007) The development of semantic meta-database: an ontology based semantic integration of biological databases. Project Report. Faculty of Computer Science and Information System, Skudai, Johor. (Unpublished)
spellingShingle QA75 Electronic computers. Computer science
Samsudin, Ruhaidah
Deris, Safaai
Othman, Muhammad Razib
Md. Illias, Rosli
The development of semantic meta-database: an ontology based semantic integration of biological databases
title The development of semantic meta-database: an ontology based semantic integration of biological databases
title_full The development of semantic meta-database: an ontology based semantic integration of biological databases
title_fullStr The development of semantic meta-database: an ontology based semantic integration of biological databases
title_full_unstemmed The development of semantic meta-database: an ontology based semantic integration of biological databases
title_short The development of semantic meta-database: an ontology based semantic integration of biological databases
title_sort development of semantic meta-database: an ontology based semantic integration of biological databases
topic QA75 Electronic computers. Computer science
url http://eprints.utm.my/4141/
http://eprints.utm.my/4141/1/74289.pdf