Tree model guided candidate generation for mining frequent subtrees from XML

Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent patterns from XML documents can be recast as mining frequent tree structures from a...

Full description

Bibliographic Details
Main Authors: Tan, Henry, Hadzic, Fedja, Dillon, Tharam S., Chang, Elizabeth, Feng, Ling, Feng, L.
Format: Journal Article
Published: ACM 2008
Subjects:
Online Access:http://doi.acm.org/10.1145/1376815.1376818
http://hdl.handle.net/20.500.11937/14717
_version_ 1848748697925451776
author Tan, Henry
Hadzic, Fedja
Dillon, Tharam S.
Chang, Elizabeth
Feng, Ling
Feng, L.
author_facet Tan, Henry
Hadzic, Fedja
Dillon, Tharam S.
Chang, Elizabeth
Feng, Ling
Feng, L.
author_sort Tan, Henry
building Curtin Institutional Repository
collection Online Access
description Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent patterns from XML documents can be recast as mining frequent tree structures from a database of XML documents. In this study, we model a database of XML documents as a database of rooted labeled ordered subtrees. In particular, we are mainly coneerned with mining frequent induced and embedded ordered subtrees. Our main contributions arc as follows. We describe our unique embedding list representation of the tree structure, which enables efficient implementation ofour Tree Model Guided (TMG) candidate generation. TMG is an optimal, non-redundant enumeration strategy which enumerates all the valid candidates that conform to the structural aspects of the data. We show through a mathematical model and experiments that TMG has better complexity compared to the commonly used join approach. In this paper, we propose two algorithms, MB3Miner and iMB3-Miner. MB3-Miner mines embedded subtrees. iMB3-Miner mines induced and/or embedded subtrees by using the maximum level of embedding constraint. Our experiments with both synthetic and real datasets against two well known algorithms for mining induced and embedded subtrees, demonstrate the effeetiveness and the efficiency of the proposed techniques.
first_indexed 2025-11-14T07:09:10Z
format Journal Article
id curtin-20.500.11937-14717
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T07:09:10Z
publishDate 2008
publisher ACM
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-147172019-02-19T05:34:54Z Tree model guided candidate generation for mining frequent subtrees from XML Tan, Henry Hadzic, Fedja Dillon, Tharam S. Chang, Elizabeth Feng, Ling Feng, L. FREQT TreeMiner Tree Model Guided TMG Tree Mining Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent patterns from XML documents can be recast as mining frequent tree structures from a database of XML documents. In this study, we model a database of XML documents as a database of rooted labeled ordered subtrees. In particular, we are mainly coneerned with mining frequent induced and embedded ordered subtrees. Our main contributions arc as follows. We describe our unique embedding list representation of the tree structure, which enables efficient implementation ofour Tree Model Guided (TMG) candidate generation. TMG is an optimal, non-redundant enumeration strategy which enumerates all the valid candidates that conform to the structural aspects of the data. We show through a mathematical model and experiments that TMG has better complexity compared to the commonly used join approach. In this paper, we propose two algorithms, MB3Miner and iMB3-Miner. MB3-Miner mines embedded subtrees. iMB3-Miner mines induced and/or embedded subtrees by using the maximum level of embedding constraint. Our experiments with both synthetic and real datasets against two well known algorithms for mining induced and embedded subtrees, demonstrate the effeetiveness and the efficiency of the proposed techniques. 2008 Journal Article http://hdl.handle.net/20.500.11937/14717 http://doi.acm.org/10.1145/1376815.1376818 ACM fulltext
spellingShingle FREQT
TreeMiner
Tree Model Guided
TMG
Tree Mining
Tan, Henry
Hadzic, Fedja
Dillon, Tharam S.
Chang, Elizabeth
Feng, Ling
Feng, L.
Tree model guided candidate generation for mining frequent subtrees from XML
title Tree model guided candidate generation for mining frequent subtrees from XML
title_full Tree model guided candidate generation for mining frequent subtrees from XML
title_fullStr Tree model guided candidate generation for mining frequent subtrees from XML
title_full_unstemmed Tree model guided candidate generation for mining frequent subtrees from XML
title_short Tree model guided candidate generation for mining frequent subtrees from XML
title_sort tree model guided candidate generation for mining frequent subtrees from xml
topic FREQT
TreeMiner
Tree Model Guided
TMG
Tree Mining
url http://doi.acm.org/10.1145/1376815.1376818
http://hdl.handle.net/20.500.11937/14717