A structure preserving flat data format representation for tree-structured data

Mining of semi-structured data such as XML is a popular research topic due to many useful applications. The initial work focused mainly on values associated with tags, while most of recent developments focus on discovering association rules among tree structured data objects to preserve the structur...

Full description

Bibliographic Details
Main Author: Hadzic, Fedja
Other Authors: L. Cao
Format: Conference Paper
Published: Springer 2012
Subjects:
Online Access:http://conferences.telecom-bretagne.eu/data/qimie2011/hadzic-informal_QIMIE_2011.pdf
http://hdl.handle.net/20.500.11937/37719
_version_ 1848755125613494272
author Hadzic, Fedja
author2 L. Cao
author_facet L. Cao
Hadzic, Fedja
author_sort Hadzic, Fedja
building Curtin Institutional Repository
collection Online Access
description Mining of semi-structured data such as XML is a popular research topic due to many useful applications. The initial work focused mainly on values associated with tags, while most of recent developments focus on discovering association rules among tree structured data objects to preserve the structural information. Other data mining techniques have had limited use in tree-structured data analysis as they were mainly designed to process flat data format with no need to capture the structural properties of data objects. This paper proposes a novel structure-preserving way for representing tree-structured document instances as records in a standard flat data structure to enable applicability of a wider range of data analysis techniques. The experiments using synthetic and real world data demonstrate the effectiveness of the proposed approach.
first_indexed 2025-11-14T08:51:20Z
format Conference Paper
id curtin-20.500.11937-37719
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T08:51:20Z
publishDate 2012
publisher Springer
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-377192023-02-07T08:01:20Z A structure preserving flat data format representation for tree-structured data Hadzic, Fedja L. Cao J. Huang J. Bailey Y. Koh J. Luo XML mining decision tree learning from XML data tree mining Mining of semi-structured data such as XML is a popular research topic due to many useful applications. The initial work focused mainly on values associated with tags, while most of recent developments focus on discovering association rules among tree structured data objects to preserve the structural information. Other data mining techniques have had limited use in tree-structured data analysis as they were mainly designed to process flat data format with no need to capture the structural properties of data objects. This paper proposes a novel structure-preserving way for representing tree-structured document instances as records in a standard flat data structure to enable applicability of a wider range of data analysis techniques. The experiments using synthetic and real world data demonstrate the effectiveness of the proposed approach. 2012 Conference Paper http://hdl.handle.net/20.500.11937/37719 http://conferences.telecom-bretagne.eu/data/qimie2011/hadzic-informal_QIMIE_2011.pdf Springer restricted
spellingShingle XML mining
decision tree learning from XML data
tree mining
Hadzic, Fedja
A structure preserving flat data format representation for tree-structured data
title A structure preserving flat data format representation for tree-structured data
title_full A structure preserving flat data format representation for tree-structured data
title_fullStr A structure preserving flat data format representation for tree-structured data
title_full_unstemmed A structure preserving flat data format representation for tree-structured data
title_short A structure preserving flat data format representation for tree-structured data
title_sort structure preserving flat data format representation for tree-structured data
topic XML mining
decision tree learning from XML data
tree mining
url http://conferences.telecom-bretagne.eu/data/qimie2011/hadzic-informal_QIMIE_2011.pdf
http://hdl.handle.net/20.500.11937/37719