XML document clustering using structure-preserving flat representation of XML content and structure
With the increasing use of XML in many domains, XML document clustering has been a central research topic in semistructured data management and mining. Due to the semistructured nature of XML data, the clustering problem becomes particularly challenging, mainly because structural similarity measures...
| Main Authors: | , , |
|---|---|
| Other Authors: | |
| Format: | Conference Paper |
| Published: |
Springer
2011
|
| Online Access: | http://hdl.handle.net/20.500.11937/4997 |
| _version_ | 1848744671849742336 |
|---|---|
| author | Hadzic, Fedja Hecker, Michael Tagerelli, A. |
| author2 | Deyi Li |
| author_facet | Deyi Li Hadzic, Fedja Hecker, Michael Tagerelli, A. |
| author_sort | Hadzic, Fedja |
| building | Curtin Institutional Repository |
| collection | Online Access |
| description | With the increasing use of XML in many domains, XML document clustering has been a central research topic in semistructured data management and mining. Due to the semistructured nature of XML data, the clustering problem becomes particularly challenging, mainly because structural similarity measures specifically designed to deal with tree/graph-shaped data can be quite expensive. Specialized clustering techniques are being developed to account for this difficulty, however most of them still assume that XML documents are represented using a semistructured data model. In this paper we take a simpler approach whereby XML structural aspects are extracted from the documents to generate a flat data format to which well-established clustering methods can be directly applied. Hence, the expensive process of tree/graph data mining is avoided, while the structural properties are still preserved. Our experimental evaluation using a number of real world datasets and comparing with existing structural clustering methods, has demonstrated the significance of our approach. |
| first_indexed | 2025-11-14T06:05:11Z |
| format | Conference Paper |
| id | curtin-20.500.11937-4997 |
| institution | Curtin University Malaysia |
| institution_category | Local University |
| last_indexed | 2025-11-14T06:05:11Z |
| publishDate | 2011 |
| publisher | Springer |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | curtin-20.500.11937-49972023-01-18T08:46:46Z XML document clustering using structure-preserving flat representation of XML content and structure Hadzic, Fedja Hecker, Michael Tagerelli, A. Deyi Li Bing Liu Charu C Aggarwal With the increasing use of XML in many domains, XML document clustering has been a central research topic in semistructured data management and mining. Due to the semistructured nature of XML data, the clustering problem becomes particularly challenging, mainly because structural similarity measures specifically designed to deal with tree/graph-shaped data can be quite expensive. Specialized clustering techniques are being developed to account for this difficulty, however most of them still assume that XML documents are represented using a semistructured data model. In this paper we take a simpler approach whereby XML structural aspects are extracted from the documents to generate a flat data format to which well-established clustering methods can be directly applied. Hence, the expensive process of tree/graph data mining is avoided, while the structural properties are still preserved. Our experimental evaluation using a number of real world datasets and comparing with existing structural clustering methods, has demonstrated the significance of our approach. 2011 Conference Paper http://hdl.handle.net/20.500.11937/4997 Springer restricted |
| spellingShingle | Hadzic, Fedja Hecker, Michael Tagerelli, A. XML document clustering using structure-preserving flat representation of XML content and structure |
| title | XML document clustering using structure-preserving flat representation of XML content and structure |
| title_full | XML document clustering using structure-preserving flat representation of XML content and structure |
| title_fullStr | XML document clustering using structure-preserving flat representation of XML content and structure |
| title_full_unstemmed | XML document clustering using structure-preserving flat representation of XML content and structure |
| title_short | XML document clustering using structure-preserving flat representation of XML content and structure |
| title_sort | xml document clustering using structure-preserving flat representation of xml content and structure |
| url | http://hdl.handle.net/20.500.11937/4997 |