Model guided algorithm for mining unordered embedded subtrees

Large amount of online information is or can be represented using semi-structured documents, such as XML. The information contained in an XML document can be effectively represented using a rooted ordered labeled tree. This has made the frequent pattern mining problem recast as the frequent subtree...

Full description

Bibliographic Details
Main Authors: Hadzic, Fedja, Tan, H., Dillon, Tharam S.
Format: Journal Article
Published: IOS Press 2010
Subjects:
Online Access:http://hdl.handle.net/20.500.11937/37772
_version_ 1848755139754590208
author Hadzic, Fedja
Tan, H.
Dillon, Tharam S.
author_facet Hadzic, Fedja
Tan, H.
Dillon, Tharam S.
author_sort Hadzic, Fedja
building Curtin Institutional Repository
collection Online Access
description Large amount of online information is or can be represented using semi-structured documents, such as XML. The information contained in an XML document can be effectively represented using a rooted ordered labeled tree. This has made the frequent pattern mining problem recast as the frequent subtree mining problem, which is a pre-requisite for association rule mining form tree-structured documents. Driven by different application needs a number of algorithms have been developed for mining of different subtree types under different support definitions. In this paper we present an algorithm for mining unordered embedded subtrees. It is an extension of our general tree model guided (TMG) candidate generation framework and the proposed U3 algorithm considers all support definitions, namely, transaction-based, occurrence-match and hybrid support. A number of experiments are presented on synthetic and real world data sets. The results demonstrate the flexibility of our general TMG framework as well as its efficiency when compared to the existing state-of-the-art approach.
first_indexed 2025-11-14T08:51:34Z
format Journal Article
id curtin-20.500.11937-37772
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T08:51:34Z
publishDate 2010
publisher IOS Press
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-377722017-09-13T15:58:11Z Model guided algorithm for mining unordered embedded subtrees Hadzic, Fedja Tan, H. Dillon, Tharam S. data mining Tree mining unordered embedded subtrees canonical form algorithm Large amount of online information is or can be represented using semi-structured documents, such as XML. The information contained in an XML document can be effectively represented using a rooted ordered labeled tree. This has made the frequent pattern mining problem recast as the frequent subtree mining problem, which is a pre-requisite for association rule mining form tree-structured documents. Driven by different application needs a number of algorithms have been developed for mining of different subtree types under different support definitions. In this paper we present an algorithm for mining unordered embedded subtrees. It is an extension of our general tree model guided (TMG) candidate generation framework and the proposed U3 algorithm considers all support definitions, namely, transaction-based, occurrence-match and hybrid support. A number of experiments are presented on synthetic and real world data sets. The results demonstrate the flexibility of our general TMG framework as well as its efficiency when compared to the existing state-of-the-art approach. 2010 Journal Article http://hdl.handle.net/20.500.11937/37772 10.3233/WIA-2010-0200 IOS Press restricted
spellingShingle data mining
Tree mining
unordered embedded subtrees
canonical form
algorithm
Hadzic, Fedja
Tan, H.
Dillon, Tharam S.
Model guided algorithm for mining unordered embedded subtrees
title Model guided algorithm for mining unordered embedded subtrees
title_full Model guided algorithm for mining unordered embedded subtrees
title_fullStr Model guided algorithm for mining unordered embedded subtrees
title_full_unstemmed Model guided algorithm for mining unordered embedded subtrees
title_short Model guided algorithm for mining unordered embedded subtrees
title_sort model guided algorithm for mining unordered embedded subtrees
topic data mining
Tree mining
unordered embedded subtrees
canonical form
algorithm
url http://hdl.handle.net/20.500.11937/37772