A comparative study of reduced error pruning method in decision tree algorithms
Decision tree is one of the most popular and efficient technique in data mining. This technique has been established and well-explored by many researchers. However, some decision tree algorithms may produce a large structure of tree size and it is difficult to understand. Furthermore, misclassificat...
Main Authors: | , , |
---|---|
Format: | Conference or Workshop Item |
Published: |
2012
|
Subjects: | |
Online Access: | http://eprints.uthm.edu.my/3466/ http://eprints.uthm.edu.my/3466/1/0074ID137.pdf |
Summary: | Decision tree is one of the most popular and
efficient technique in data mining. This technique has been
established and well-explored by many researchers. However,
some decision tree algorithms may produce a large structure of
tree size and it is difficult to understand. Furthermore,
misclassification of data often occurs in learning process.
Therefore, a decision tree algorithm that can produce a simple
tree structure with high accuracy in term of classification rate
is a need to work with huge volume of data. Pruning methods
have been introduced to reduce the complexity of tree
structure without decrease the accuracy of classification. One
of pruning methods is the Reduced Error Pruning (REP). To
better understand pruning methods, an experiment was
conducted using Weka application to compare the
performance in term of complexity of tree structure and
accuracy of classification for J48, REPTree, PART, JRip, and
Ridor algorithms using seven standard datasets from UCI
machine learning repository. In data modeling, J48 and
REPTree generate tree structure as an output while PART,
Ridor and JRip generate rules. In additional J48, REPTree
and PART using REP method for pruning while Ridor and
JRip using improvement of REP method, namely IREP and
RIPPER methods. The experiment result shown performance
of J48 and REPTree are competitive in producing better result.
Between J48 and REPTree, average differences performance
of accuracy of classification is 7.1006% and 6.2857% for
complexity of tree structure. For classification rules
algorithms, Ridor is the best algorithms compare to PART and
JRip due to highest percentage of accuracy of classification in
five dataset from seven datasets. An algorithm that produces
high accuracy with simple tree structure or simple rules can be
awarded as the best algorithm in decision tree. |
---|