| Summary: | Data mining is increasingly practiced in the sciences to draw out information from enormous data sets generated by modern experimental and observational methods. It is used to find new, hidden, or unexpected patterns in data. In order to handle the data, there are several data mining techniques that can be used. One of them is classification method. The method can be a single classifier or multi-classifiers. The accuracy of the classifiers determined its performance. However, using only single classifiers cannot guarantee good performance in terms of accuracy. Thus, the combination of different classifiers is considered as a classification approach in order to get higher performance of classification accuracy. The aim of this thesis is to propose a new multi-classifier model that will combine different classifiers in order to achieve higher accuracy in classifying water quality data. Kinta River dataset was chosen as a test data to demonstrate the proposed approach in achieving high accuracy. The model applies fusion classification between different classifiers. It can be done by combining two or more classifiers, followed by choosing the fusion output that achieves the highest accuracy and then combine it with other classifiers. The fusion is based on Majority Vote as a combination rule, which only works with nominal classes. Then, each of these classifiers will predict a nominal class label for a case study of Kinta River. The classifiers chosen to be combined are Naive Bayes (NB), Multilayer Perceptron (MLP), Decision Tree (J48), Sequential Minimal Optimisation (SMO), and instancebases learning with parameter k (IBk). In single classification, MLP and IBk classifiers achieved the highest accuracy with the same percentage (91.57%) and will be acts as base classifiers for the next stage (fusion stage). The results showed that the combination of classifiers with IBk as base classifiers which is IBk+NB+MLP+SMO was superior compared to other combinations (IBk+SMO+MLP+ J48, IBk+SMO+MLP, IBk+SMO+NB, IBk+SMO+J48, IBk+MLP, IBk+SMO, IBk+NB, IBk+ J48). The selected combination has achieved higher accuracy with the percentage of 93.98% and resulting in less consumption of time in building a model. In conclusion, by using right combination of classifiers, the accuracy has been improved compared to a single classifiers.
|