Predicting students’ performance in mathematics subjects at Kolej MARA Banting using machine learning methods

Predicting students’ performance is crucial for personalised and educational success for individuals. However, no standard procedure or method considers external factors to predict students’ performance in mathematics at Kolej MARA Banting (KMB). This research aims to address this problem by explori...

Full description

Bibliographic Details
Main Authors: Ahmad Akif, Ibrahim, Nor Azuana, Ramli, Sahimel Azwal, Sulaiman
Format: Article
Language:English
Published: Universiti Pendidikan Sultan Idris 2025
Subjects:
Online Access:https://umpir.ump.edu.my/id/eprint/45505/
Description
Summary:Predicting students’ performance is crucial for personalised and educational success for individuals. However, no standard procedure or method considers external factors to predict students’ performance in mathematics at Kolej MARA Banting (KMB). This research aims to address this problem by exploring the potential of machine learning methods for predicting students’ performance in mathematics at KMB. The study follows a machine learning process: data collection, attribute selection, pre-processing, model training, and evaluation. A sample of 703 data points on students’ demographics, academic records, and mathematics performance were collected and pre-processed. Machine learning models such as support vector machine, decision tree, k-nearest neighbours, Naïve Bayes, Random Forest, AdaBoost, and stacking model were applied in this study. The accuracy and performance of these models were assessed to determine which model outperformed the others and its effectiveness in predicting students’ mathematics performance. The study findings demonstrate that the stacking model exhibited superior performance in accuracy (71.43%), precision (68.73%), recall (71.43%), and F1-score (69.80%) compared to the other models. Nevertheless, it is essential to note that the stacking model achieved moderate accuracy. This could be attributed to the inherent difficulties in constructing a precise predictive model for student performance, such as the models failing to sufficiently reflect the complexities within the dataset, resulting in underfitting. Additionally, the target attribute, International Baccalaureate (IB) grade, is imbalanced, with more high performers than low performers, causing the models to be biased towards the majority class and impacting overall accuracy. The performance of the models in this study could be improved by adding more features related to students’ performance, such as anxiety, depression, well-being, and others, to capture enough complexity in the data. It is also suggested that samples from other colleges with a balanced grade distribution be obtained compared to students at KMB.