Ascertaining data mining rules using statistical approaches

Knowledge acquisition techniques have been well researched in the data mining community. Such techniques, especially when used for unsupervised learning, often generate a large quantity of rules and patterns. While many rules generated are useful and interesting, some information is not captured by...

Full description

Bibliographic Details
Main Authors: Mohd Shaharanee, I., Dillon, Tharam S, Hadzic, Fedja
Other Authors: Parvinder S. Sandhu
Format: Conference Paper
Published: International Association of Computer Science and Information Technology (IACSIT) 2009
Subjects:
Online Access:http://hdl.handle.net/20.500.11937/43700
Description
Summary:Knowledge acquisition techniques have been well researched in the data mining community. Such techniques, especially when used for unsupervised learning, often generate a large quantity of rules and patterns. While many rules generated are useful and interesting, some information is not captured by those rules, such as already known patterns, coincidental patterns and patterns with no significant value for the real world applications. Sustaining the interestingness of rules generated by data mining algorithm is an active and important area of data mining research. Different methods have been proposed and have been well examined for discovering interestingness in rules. These measures often only reflect the interestingness with respect to the database being observed, and as such the rules will satisfy the constrains with respect to the sample data only, but not with respect to the whole data distribution. Therefore, one can still argue the usefulness of the rules and patterns with respect to their use in practical problems. As the data mining techniques are naturally data driven, it would benefit to affirm the generated hypothesis with a statistical methodology. In our research, we investigate how to combine data mining and statistical measurement techniques to arrive at more reliable and interesting set of rules. Such a combination is greatly essential to conquer the data overload in practical problems. A real world data set is used to explore the ways in which one can measure and verify the usefulness of rules from data mining techniques using statistical analysis.