Effective pruning strategies for sequential pattern mining

In this paper, we systematically explore the search space of frequent sequence mining and present two novel pruning strategies, S E P (Sequence Extension Pruning) and I EP (Item Extension Pruning), which can be used in all Aption-like sequence mining algorithms or lattice-theoretic approaches. With...

Full description

Bibliographic Details
Main Authors: Xu, Y., Ma, Z., Li, L., Dillon, Tharam S.
Other Authors: Q. Luo
Format: Conference Paper
Published: Institute of Electrical and Electronics Engineers (IEEE) Computer Society 2008
Online Access:http://hdl.handle.net/20.500.11937/34160
Description
Summary:In this paper, we systematically explore the search space of frequent sequence mining and present two novel pruning strategies, S E P (Sequence Extension Pruning) and I EP (Item Extension Pruning), which can be used in all Aption-like sequence mining algorithms or lattice-theoretic approaches. With a little more memory overhead, proposed pruning strategies can prune invalidated search space and decrease the total cost of frequency counting effectively. For effectiveness testing reason, we optimize SPAM [2) and present the improved algorithm, S P AMSEPIEP' which uses S E P and IEP to prune the search space by sharing the frequent 2sequences lists. A set of comprehensive performance experiments study shows that S P AMSEPIEP outperforms SPAM by a factor of 10 on small datasets and better than 30 % to 50 % on reasonably large dataset.