A Statistical Interestingness Measures for XML based Association Rules

Recently mining frequent substructures from XML data has gained a considerable amount of interest. Different methods have been proposed and examined for mining frequent patterns from XML documents efficiently and effectively. While many frequent XML patterns generated are useful and interesting, it...

Full description

Bibliographic Details
Main Authors: Mohd Shaharanee, Izwan, Hadzic, Fedja, Dillon, Tharam S.
Other Authors: Byoung Tak Zhang
Format: Book Chapter
Published: Springer 2010
Subjects:
Online Access:http://hdl.handle.net/20.500.11937/18190
_version_ 1848749674058481664
author Mohd Shaharanee, Izwan
Hadzic, Fedja
Dillon, Tharam S.
author2 Byoung Tak Zhang
author_facet Byoung Tak Zhang
Mohd Shaharanee, Izwan
Hadzic, Fedja
Dillon, Tharam S.
author_sort Mohd Shaharanee, Izwan
building Curtin Institutional Repository
collection Online Access
description Recently mining frequent substructures from XML data has gained a considerable amount of interest. Different methods have been proposed and examined for mining frequent patterns from XML documents efficiently and effectively. While many frequent XML patterns generated are useful and interesting, it is common that a large portion of them is not considered as interesting or significant for the application at hand. In this paper, we present a systematic approach to ascertain whether the discovered XML patterns are significant and not just coincidental associations, and provide a precise statistical approach to support this framework. The proposed strategy combines data mining and statistical measurement techniques to discard the non significant patterns. In this paper we considered the “Prions” database that describes the protein instances stored for Human Prions Protein. The proposed unified framework is applied on this dataset to demonstrate its effectiveness in assessing interestingness of discovered XML patterns by statistical means. When the dataset is used for classification/prediction purposes, the proposed approach will discard non significant XML patterns, without the cost of a reduction in the accuracy of the pattern set as a whole.
first_indexed 2025-11-14T07:24:41Z
format Book Chapter
id curtin-20.500.11937-18190
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T07:24:41Z
publishDate 2010
publisher Springer
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-181902023-01-13T07:56:29Z A Statistical Interestingness Measures for XML based Association Rules Mohd Shaharanee, Izwan Hadzic, Fedja Dillon, Tharam S. Byoung Tak Zhang Mehmet A Orgun data mining semi-structured data statistical analysis interesting rules Recently mining frequent substructures from XML data has gained a considerable amount of interest. Different methods have been proposed and examined for mining frequent patterns from XML documents efficiently and effectively. While many frequent XML patterns generated are useful and interesting, it is common that a large portion of them is not considered as interesting or significant for the application at hand. In this paper, we present a systematic approach to ascertain whether the discovered XML patterns are significant and not just coincidental associations, and provide a precise statistical approach to support this framework. The proposed strategy combines data mining and statistical measurement techniques to discard the non significant patterns. In this paper we considered the “Prions” database that describes the protein instances stored for Human Prions Protein. The proposed unified framework is applied on this dataset to demonstrate its effectiveness in assessing interestingness of discovered XML patterns by statistical means. When the dataset is used for classification/prediction purposes, the proposed approach will discard non significant XML patterns, without the cost of a reduction in the accuracy of the pattern set as a whole. 2010 Book Chapter http://hdl.handle.net/20.500.11937/18190 Springer restricted
spellingShingle data mining
semi-structured data
statistical analysis
interesting rules
Mohd Shaharanee, Izwan
Hadzic, Fedja
Dillon, Tharam S.
A Statistical Interestingness Measures for XML based Association Rules
title A Statistical Interestingness Measures for XML based Association Rules
title_full A Statistical Interestingness Measures for XML based Association Rules
title_fullStr A Statistical Interestingness Measures for XML based Association Rules
title_full_unstemmed A Statistical Interestingness Measures for XML based Association Rules
title_short A Statistical Interestingness Measures for XML based Association Rules
title_sort statistical interestingness measures for xml based association rules
topic data mining
semi-structured data
statistical analysis
interesting rules
url http://hdl.handle.net/20.500.11937/18190