| Summary: | In today's strongly cornpetitrve environment, there are two key factors in higher
education institutions success which are students' retention and their academic
performance. Therefore, the institutions need a student retention plan in order tominimize number of students drop out. However, in coping with student retention
problem, many institutions are having difficulties in identifying excellent, good,
average and weak students. The institutions are also unable to identify important
factors that give high influence to the students' performance. Furthermore, the
institutions are unable to handle and use students' data effectively and efficiently due to
its large volumes and complexity. This study aims to apply classification algorithms on
University College Bestari educational dataset in order to classify students' academic
performance based on their personal background, admission data and previous
academic results, evaluate the algorithms' performance and identify parameters that
give high influence to the academic performance of students by comparing the results
produced. To achieve those objectives, classification techniques were implemented in
this study using lO-fold cross-validation method to build training and testing dataset.
Using WEKA 3.8.2 software, this study implemented decision tree (J48), Naive Bayes
and artificial neural network (Multilayer Perceptron). Naive Bayes is chosen because
of its ability to work with small amount of data whereas J48 produces a decision tree
which can be used to identify most influencing attributes. Multilayer Perceptron is
selected because it uses backpropagation algorithm which allows the classifier to
adaptively learn from mistakes and thus, yields accurate results. The classifiers were
implemented on two datasets: the one which contains unequal class distribution while
the other has more balanced class distribution. The first dataset used students' class
honours as the target variable whereas the second dataset used students' performance
level as the target variable. Since one of the dataset is imbalanced, using accuracy as
the only evaluation metric is inadequate. Therefore, four evaluation measures were
used to assess the results: accuracy, sensitivity, specificity and Area Under Curve
(AUC). Results show that Multilayer Perceptron is the best classifier to work with
Honours Degree dataset with 90.60% accuracy. Meanwhile, Naive Bayes is the best
classifier for Performance Level dataset with 70.94% accuracy. Naive Bayes also able
to correctly identify minority classes. On the other hand, decision tree performed
poorly compared to other classifiers especially on minority classes. The study also
found that first semester Grade Point Average, general courses scores and high school
academic results are important attributes which give high influence to students'
achievement. There is correlation between students' scores in Islamic Civilization and
Asian Civilization (TITAS) and Basic Entrepreneurship courses with students' final
achievement. The usage of classification algorithms in educational data mining could
assist institutions and instructors to classify students' academic performance and
identify average and weak students and thus can help them to make decisions on the
student retention plan.
|