Mining substructures in protein data

In this paper we consider the 'Prions' database that describes protein instances stored for Human Prion Proteins. The Prions database can be viewed as a database of rooted ordered labeled subtrees. Mining frequent substructures from tree databases is an important task and it has gained a c...

Full description

Bibliographic Details
Main Authors:	Hadzic, Fedja, Dillon, Tharam S., Sidhu, Amandeep, Chang, Elizabeth, Tan, H.
Format:	Conference Paper
Published:	IEEE 2006
Subjects:	structure matching Protein discovery frequent subtree mining association mining
Online Access:	http://hdl.handle.net/20.500.11937/7278

_version_	1848745323943428096
author	Hadzic, Fedja Dillon, Tharam S. Sidhu, Amandeep Chang, Elizabeth Tan, H.
author_facet	Hadzic, Fedja Dillon, Tharam S. Sidhu, Amandeep Chang, Elizabeth Tan, H.
author_sort	Hadzic, Fedja
building	Curtin Institutional Repository
collection	Online Access
description	In this paper we consider the 'Prions' database that describes protein instances stored for Human Prion Proteins. The Prions database can be viewed as a database of rooted ordered labeled subtrees. Mining frequent substructures from tree databases is an important task and it has gained a considerable amount of interest in areas such as XML mining, Bioinformatics, Web mining etc. This has given rise to the development of many tree mining algorithms which can aid in structural comparisons, association rule discovery and in general mining of tree structured knowledge representations. Previously we have developed the MB3 tree mining algorithm, which given a minimum support threshold, efficiently discovers all frequent embedded subtrees from a database of rooted ordered labeled subtrees. In this work we apply the algorithm to the Prions database in order to extract the frequently occurring patterns, which in this case are of induced subtree type. Obtaining the set of frequent induced subtrees from the Prions database can potentially reveal some useful knowledge. This aspect will be demonstrated by providing an analysis of the extracted frequent subtrees with respect to discovering interesting protein information. Furthermore, the minimum support threshold can be used as the controlling factor for answering specific queries posed on the Prions dataset. This approach is shown to be a viable technique for mining protein data.
first_indexed	2025-11-14T06:15:32Z
format	Conference Paper
id	curtin-20.500.11937-7278
institution	Curtin University Malaysia
institution_category	Local University
last_indexed	2025-11-14T06:15:32Z
publishDate	2006
publisher	IEEE
recordtype	eprints
repository_type	Digital Repository
spelling	curtin-20.500.11937-72782017-01-30T10:58:50Z Mining substructures in protein data Hadzic, Fedja Dillon, Tharam S. Sidhu, Amandeep Chang, Elizabeth Tan, H. structure matching Protein discovery frequent subtree mining association mining In this paper we consider the 'Prions' database that describes protein instances stored for Human Prion Proteins. The Prions database can be viewed as a database of rooted ordered labeled subtrees. Mining frequent substructures from tree databases is an important task and it has gained a considerable amount of interest in areas such as XML mining, Bioinformatics, Web mining etc. This has given rise to the development of many tree mining algorithms which can aid in structural comparisons, association rule discovery and in general mining of tree structured knowledge representations. Previously we have developed the MB3 tree mining algorithm, which given a minimum support threshold, efficiently discovers all frequent embedded subtrees from a database of rooted ordered labeled subtrees. In this work we apply the algorithm to the Prions database in order to extract the frequently occurring patterns, which in this case are of induced subtree type. Obtaining the set of frequent induced subtrees from the Prions database can potentially reveal some useful knowledge. This aspect will be demonstrated by providing an analysis of the extracted frequent subtrees with respect to discovering interesting protein information. Furthermore, the minimum support threshold can be used as the controlling factor for answering specific queries posed on the Prions dataset. This approach is shown to be a viable technique for mining protein data. 2006 Conference Paper http://hdl.handle.net/20.500.11937/7278 IEEE fulltext
spellingShingle	structure matching Protein discovery frequent subtree mining association mining Hadzic, Fedja Dillon, Tharam S. Sidhu, Amandeep Chang, Elizabeth Tan, H. Mining substructures in protein data
title	Mining substructures in protein data
title_full	Mining substructures in protein data
title_fullStr	Mining substructures in protein data
title_full_unstemmed	Mining substructures in protein data
title_short	Mining substructures in protein data
title_sort	mining substructures in protein data
topic	structure matching Protein discovery frequent subtree mining association mining
url	http://hdl.handle.net/20.500.11937/7278

Mining substructures in protein data

Similar Items