Mining Induced/Embedded Subtrees using the Level of Embedding Constraint

The increasing need for representing information through more complex structures where semantics and relationships among data objects can be more easily expressed has resulted in many semi-structured data sources. Structure comparison among semi-structured data objects can often reveal valuable info...

Full description

Bibliographic Details
Main Authors:	Tan, H., Hadzic, Fedja, Dillon, T.
Format:	Journal Article
Published:	IOS Press 2012
Online Access:	http://hdl.handle.net/20.500.11937/34877

_version_	1848754342648086528
author	Tan, H. Hadzic, Fedja Dillon, T.
author_facet	Tan, H. Hadzic, Fedja Dillon, T.
author_sort	Tan, H.
building	Curtin Institutional Repository
collection	Online Access
description	The increasing need for representing information through more complex structures where semantics and relationships among data objects can be more easily expressed has resulted in many semi-structured data sources. Structure comparison among semi-structured data objects can often reveal valuable information, and hence tree mining has gained a considerable amount of interest in areas such as XML mining, Bioinformatics, Web mining etc. We are primarily concerned with the task of mining frequent ordered induced and embedded subtrees from a database of rooted ordered labelled trees. Our previous contributions consist of the efficient Tree Model Guided (TMG) candidate enumeration approach for which we developed a mathematical model that provides an estimate of the worst case complexity for embedded subtree mining. This potentially reveals computationally impractical situations where one would be forced to constrain the mining process in some way so that at least some patterns can be discovered. This motivated our strategy of tackling the complexity of mining embedded subtrees by introducing the Level of Embedding constraint.Thus, when it is too costly to mine all frequent embedded subtrees, one can decrease the level of embedding constraint gradually down to 1, from which all the obtained frequent subtrees are induced subtrees. In this paper we develop alternative implementations and propose two algorithms MB3-R and iMB3-R, which achieve better efficiency in terms of time and space. Furthermore, we develop a mathematical model for estimating the worst case complexity for induced subtree mining. It is accompanied with a theoretical analysis of induced-embedded subtree relationships in terms of complexity for frequent subtree mining. Using synthetic and real world data we practically demonstrate the space and time efficiency of our new approach and provide some comparisons to the two well know algorithms for mining induced and embedded subtrees.
first_indexed	2025-11-14T08:38:53Z
format	Journal Article
id	curtin-20.500.11937-34877
institution	Curtin University Malaysia
institution_category	Local University
last_indexed	2025-11-14T08:38:53Z
publishDate	2012
publisher	IOS Press
recordtype	eprints
repository_type	Digital Repository
spelling	curtin-20.500.11937-348772017-09-13T15:28:49Z Mining Induced/Embedded Subtrees using the Level of Embedding Constraint Tan, H. Hadzic, Fedja Dillon, T. The increasing need for representing information through more complex structures where semantics and relationships among data objects can be more easily expressed has resulted in many semi-structured data sources. Structure comparison among semi-structured data objects can often reveal valuable information, and hence tree mining has gained a considerable amount of interest in areas such as XML mining, Bioinformatics, Web mining etc. We are primarily concerned with the task of mining frequent ordered induced and embedded subtrees from a database of rooted ordered labelled trees. Our previous contributions consist of the efficient Tree Model Guided (TMG) candidate enumeration approach for which we developed a mathematical model that provides an estimate of the worst case complexity for embedded subtree mining. This potentially reveals computationally impractical situations where one would be forced to constrain the mining process in some way so that at least some patterns can be discovered. This motivated our strategy of tackling the complexity of mining embedded subtrees by introducing the Level of Embedding constraint.Thus, when it is too costly to mine all frequent embedded subtrees, one can decrease the level of embedding constraint gradually down to 1, from which all the obtained frequent subtrees are induced subtrees. In this paper we develop alternative implementations and propose two algorithms MB3-R and iMB3-R, which achieve better efficiency in terms of time and space. Furthermore, we develop a mathematical model for estimating the worst case complexity for induced subtree mining. It is accompanied with a theoretical analysis of induced-embedded subtree relationships in terms of complexity for frequent subtree mining. Using synthetic and real world data we practically demonstrate the space and time efficiency of our new approach and provide some comparisons to the two well know algorithms for mining induced and embedded subtrees. 2012 Journal Article http://hdl.handle.net/20.500.11937/34877 10.3233/FI-2012-733 IOS Press restricted
spellingShingle	Tan, H. Hadzic, Fedja Dillon, T. Mining Induced/Embedded Subtrees using the Level of Embedding Constraint
title	Mining Induced/Embedded Subtrees using the Level of Embedding Constraint
title_full	Mining Induced/Embedded Subtrees using the Level of Embedding Constraint
title_fullStr	Mining Induced/Embedded Subtrees using the Level of Embedding Constraint
title_full_unstemmed	Mining Induced/Embedded Subtrees using the Level of Embedding Constraint
title_short	Mining Induced/Embedded Subtrees using the Level of Embedding Constraint
title_sort	mining induced/embedded subtrees using the level of embedding constraint
url	http://hdl.handle.net/20.500.11937/34877

Mining Induced/Embedded Subtrees using the Level of Embedding Constraint

Similar Items