Razor: Mining distance-constrained embedded subtrees

Our work is focused on the task of mining frequent subtrees from a database of rooted ordered labelled subtrees. Previously we have developed an efficient algorithm, MB3 [12], for mining frequent embedded subtrees from a database of rooted labeled and ordered subtrees. The efficiency comes from the...

Full description

Bibliographic Details
Main Authors:	Tan, H., Dillon, Tharam S., Hadzic, Fedja, Chang, Elizabeth
Format:	Conference Paper
Published:	IEEE 2006
Subjects:	embedded subtree structure matching mining with constraints frequent subtree mining association mining
Online Access:	http://hdl.handle.net/20.500.11937/5006

_version_	1848744674404073472
author	Tan, H. Dillon, Tharam S. Hadzic, Fedja Chang, Elizabeth
author_facet	Tan, H. Dillon, Tharam S. Hadzic, Fedja Chang, Elizabeth
author_sort	Tan, H.
building	Curtin Institutional Repository
collection	Online Access
description	Our work is focused on the task of mining frequent subtrees from a database of rooted ordered labelled subtrees. Previously we have developed an efficient algorithm, MB3 [12], for mining frequent embedded subtrees from a database of rooted labeled and ordered subtrees. The efficiency comes from the utilization of a novel Embedding List representation for Tree Model Guided (TMG) candidate generation. As an extension the IMB3 [13] algorithm introduces the Level of Embedding constraint. In this study we extend our past work by developing an algorithm, Razor, for mining embedded subtrees where the distance of nodes relative to the root of the subtree needs to be considered. This notion of distance constrained embedded tree mining will have important applications in web information systems, conceptual model analysis and more sophisticated ontology matching. Domains representing their knowledge in a tree structured form may require this additional distance information as it commonly indicates the amount of specific knowledge stored about a particular concept within the hierarchy. The structure based approaches for schema matching commonly take the distance among the concept nodes within a sub-structure into account when evaluating the concept similarity across different schemas. We present an encoding strategy to efficiently enumerate candidate subtrees taking the distance of nodes relative to the root of the subtree into account. The algorithm is applied to both synthetic and real-world datasets, and the experimental results demonstrate the correctness and effectiveness of the proposed technique.
first_indexed	2025-11-14T06:05:13Z
format	Conference Paper
id	curtin-20.500.11937-5006
institution	Curtin University Malaysia
institution_category	Local University
last_indexed	2025-11-14T06:05:13Z
publishDate	2006
publisher	IEEE
recordtype	eprints
repository_type	Digital Repository
spelling	curtin-20.500.11937-50062017-01-30T10:43:12Z Razor: Mining distance-constrained embedded subtrees Tan, H. Dillon, Tharam S. Hadzic, Fedja Chang, Elizabeth embedded subtree structure matching mining with constraints frequent subtree mining association mining Our work is focused on the task of mining frequent subtrees from a database of rooted ordered labelled subtrees. Previously we have developed an efficient algorithm, MB3 [12], for mining frequent embedded subtrees from a database of rooted labeled and ordered subtrees. The efficiency comes from the utilization of a novel Embedding List representation for Tree Model Guided (TMG) candidate generation. As an extension the IMB3 [13] algorithm introduces the Level of Embedding constraint. In this study we extend our past work by developing an algorithm, Razor, for mining embedded subtrees where the distance of nodes relative to the root of the subtree needs to be considered. This notion of distance constrained embedded tree mining will have important applications in web information systems, conceptual model analysis and more sophisticated ontology matching. Domains representing their knowledge in a tree structured form may require this additional distance information as it commonly indicates the amount of specific knowledge stored about a particular concept within the hierarchy. The structure based approaches for schema matching commonly take the distance among the concept nodes within a sub-structure into account when evaluating the concept similarity across different schemas. We present an encoding strategy to efficiently enumerate candidate subtrees taking the distance of nodes relative to the root of the subtree into account. The algorithm is applied to both synthetic and real-world datasets, and the experimental results demonstrate the correctness and effectiveness of the proposed technique. 2006 Conference Paper http://hdl.handle.net/20.500.11937/5006 IEEE fulltext
spellingShingle	embedded subtree structure matching mining with constraints frequent subtree mining association mining Tan, H. Dillon, Tharam S. Hadzic, Fedja Chang, Elizabeth Razor: Mining distance-constrained embedded subtrees
title	Razor: Mining distance-constrained embedded subtrees
title_full	Razor: Mining distance-constrained embedded subtrees
title_fullStr	Razor: Mining distance-constrained embedded subtrees
title_full_unstemmed	Razor: Mining distance-constrained embedded subtrees
title_short	Razor: Mining distance-constrained embedded subtrees
title_sort	razor: mining distance-constrained embedded subtrees
topic	embedded subtree structure matching mining with constraints frequent subtree mining association mining
url	http://hdl.handle.net/20.500.11937/5006

Razor: Mining distance-constrained embedded subtrees

Similar Items