X3-Miner: Mining patterns from XML database

An XML enabled framework for representation of association rules in databases was first presented in [Feng03]. In Frequent Structure Mining (FSM), there are techniques proposed to mine frequent patterns from complex trees and graphs databases. One of the popular approaches is to use graph matching....

Full description

Bibliographic Details
Main Authors: Chang, Elizabeth, Tan, H., Dillon, Tharam S., Feng, L., Hadzic, Fedja
Format: Conference Paper
Published: WIT Press 2005
Subjects:
Online Access:http://www.witpress.com
http://hdl.handle.net/20.500.11937/46300
_version_ 1848757519727460352
author Chang, Elizabeth
Tan, H.
Dillon, Tharam S.
Feng, L.
Hadzic, Fedja
author_facet Chang, Elizabeth
Tan, H.
Dillon, Tharam S.
Feng, L.
Hadzic, Fedja
author_sort Chang, Elizabeth
building Curtin Institutional Repository
collection Online Access
description An XML enabled framework for representation of association rules in databases was first presented in [Feng03]. In Frequent Structure Mining (FSM), there are techniques proposed to mine frequent patterns from complex trees and graphs databases. One of the popular approaches is to use graph matching. Graph matching algorithms use data structures such as the adjacency matrix [Inokuchi00] or adjacency list [FSG01]. Another approach represents semi-structured tree-like structures using a string representation, which is more space efficient and relatively easy for manipulation [Zaki02]. However, in the XML Era, mining association rules is faced with more challenges due to the inherent flexibilities of XML in both structure and semantics. The primary challenges include 1) a more complicated hierarchical data structure with tags and attributes; 2) an ordered data context; and 3) a much bigger data size. To tackle these challenges, in this paper, we propose an approach, X3-Miner, that efficiently extracts patterns from a large XML data set, and overcomes the challenges by:(1) Exploring the use of a model validating approach in deducing the number of candidates generated. The basic idea is that by taking into account of the semantics embedded in the tree-like structure in an XML database while generating candidates directly from the XML tree, we can obtain only valid (i.e., possibly existing) candidates out of the XML database;(2) Minimising I/O overhead by first trimming the infrequent 1-itemset in the XML database. The XML database is intersected with the frequent 1-itemset resulting in a smaller XML tree that contains only the frequent 1-itemset. The algorithm also progressively trims infrequent k-itemsets that contain infrequent (k-1)-itemsets.(3) Extending the notion of string representation of a tree structure proposed in [Zaki02] to xstring for describing an XML document in a flat format without loss of both structure and semantics. Such an extension enables an easier traversal of the tree-structured XML data during our model-validating candidate generation.Our experiments with both synthetic and real-life data sets demonstrate the effectiveness of the proposed model-validating approach in mining XML data.
first_indexed 2025-11-14T09:29:23Z
format Conference Paper
id curtin-20.500.11937-46300
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T09:29:23Z
publishDate 2005
publisher WIT Press
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-463002017-01-30T15:26:17Z X3-Miner: Mining patterns from XML database Chang, Elizabeth Tan, H. Dillon, Tharam S. Feng, L. Hadzic, Fedja X3-Miner Data Mining Semantic Relationships Association Mining information systems Algorithm database mining XML An XML enabled framework for representation of association rules in databases was first presented in [Feng03]. In Frequent Structure Mining (FSM), there are techniques proposed to mine frequent patterns from complex trees and graphs databases. One of the popular approaches is to use graph matching. Graph matching algorithms use data structures such as the adjacency matrix [Inokuchi00] or adjacency list [FSG01]. Another approach represents semi-structured tree-like structures using a string representation, which is more space efficient and relatively easy for manipulation [Zaki02]. However, in the XML Era, mining association rules is faced with more challenges due to the inherent flexibilities of XML in both structure and semantics. The primary challenges include 1) a more complicated hierarchical data structure with tags and attributes; 2) an ordered data context; and 3) a much bigger data size. To tackle these challenges, in this paper, we propose an approach, X3-Miner, that efficiently extracts patterns from a large XML data set, and overcomes the challenges by:(1) Exploring the use of a model validating approach in deducing the number of candidates generated. The basic idea is that by taking into account of the semantics embedded in the tree-like structure in an XML database while generating candidates directly from the XML tree, we can obtain only valid (i.e., possibly existing) candidates out of the XML database;(2) Minimising I/O overhead by first trimming the infrequent 1-itemset in the XML database. The XML database is intersected with the frequent 1-itemset resulting in a smaller XML tree that contains only the frequent 1-itemset. The algorithm also progressively trims infrequent k-itemsets that contain infrequent (k-1)-itemsets.(3) Extending the notion of string representation of a tree structure proposed in [Zaki02] to xstring for describing an XML document in a flat format without loss of both structure and semantics. Such an extension enables an easier traversal of the tree-structured XML data during our model-validating candidate generation.Our experiments with both synthetic and real-life data sets demonstrate the effectiveness of the proposed model-validating approach in mining XML data. 2005 Conference Paper http://hdl.handle.net/20.500.11937/46300 http://www.witpress.com http://library.witpress.com/pages/listpapers.asp?q_bid=327&q_subject=Computing%20_%20Information%20Management WIT Press fulltext
spellingShingle X3-Miner
Data Mining
Semantic Relationships
Association Mining
information systems
Algorithm
database mining
XML
Chang, Elizabeth
Tan, H.
Dillon, Tharam S.
Feng, L.
Hadzic, Fedja
X3-Miner: Mining patterns from XML database
title X3-Miner: Mining patterns from XML database
title_full X3-Miner: Mining patterns from XML database
title_fullStr X3-Miner: Mining patterns from XML database
title_full_unstemmed X3-Miner: Mining patterns from XML database
title_short X3-Miner: Mining patterns from XML database
title_sort x3-miner: mining patterns from xml database
topic X3-Miner
Data Mining
Semantic Relationships
Association Mining
information systems
Algorithm
database mining
XML
url http://www.witpress.com
http://www.witpress.com
http://hdl.handle.net/20.500.11937/46300