Mining unordered distance-constrained embedded subtrees

Frequent subtree mining is an important problem in the area of association rule mining from semi-structured or tree structured documents, often found in many commercial, web and scientific domains. This paper presents the u3Razor algorithm, for mining unordered embedded subtrees where the distance o...

Full description

Bibliographic Details
Main Authors: Hadzic, Fedja, Tan, Henry, Dillon, Tharam S.
Other Authors: J-F. Boulicaut
Format: Conference Paper
Published: Springer 2008
Online Access:http://hdl.handle.net/20.500.11937/28805
Description
Summary:Frequent subtree mining is an important problem in the area of association rule mining from semi-structured or tree structured documents, often found in many commercial, web and scientific domains. This paper presents the u3Razor algorithm, for mining unordered embedded subtrees where the distance of nodes relative to the root of the subtree needs to be considered. Mining distance-constrained unordered embedded subtrees will have important applications in web information systems, conceptual model analysis and more sophisticated knowledge matching. An encoding strategy is presented to efficiently enumerate candidate unordered embedded subtrees taking the distance of nodes relative to the root of the subtree into account. Both synthetic and real-world datasets were used for experimental evaluation and discussion.