Leveraging data lake architecture for predicting academic student performance

In today's rapidly evolving landscape of higher education, the effective management and analysis of academic data have become increasingly challenging, particularly in the context of the 3Vs of Big Data: volume, variety, and velocity. The amount of data produced by educational institutions has...

Full description

Bibliographic Details
Main Authors: Abdul Rahim, Shameen Aina, Sidi, Fatimah, Affendey, Lilly Suriani, Ishak, Iskandar, Nurlankyzy, Appak Yessirkep
Format: Article
Language:English
Published: Insight Society 2024
Online Access:http://psasir.upm.edu.my/id/eprint/117456/
http://psasir.upm.edu.my/id/eprint/117456/1/117456.pdf
_version_ 1848867254255484928
author Abdul Rahim, Shameen Aina
Sidi, Fatimah
Affendey, Lilly Suriani
Ishak, Iskandar
Nurlankyzy, Appak Yessirkep
author_facet Abdul Rahim, Shameen Aina
Sidi, Fatimah
Affendey, Lilly Suriani
Ishak, Iskandar
Nurlankyzy, Appak Yessirkep
author_sort Abdul Rahim, Shameen Aina
building UPM Institutional Repository
collection Online Access
description In today's rapidly evolving landscape of higher education, the effective management and analysis of academic data have become increasingly challenging, particularly in the context of the 3Vs of Big Data: volume, variety, and velocity. The amount of data produced by educational institutions has increased dramatically, including student records. This flood of data originates from various sources and takes several forms, such as learning management systems and student information systems. Hence, in education, data analytics and predictive modeling have become increasingly significant in acquiring insights into student performance, such as identifying at-risk students who are most likely to fail their courses. This study proposes a novel approach for predicting student academic performance, particularly identifying at-risk students, by leveraging a data lake architecture. The proposed methodology comprises the ingestion, transformation, and quality assessment of a combined data source from Universiti Putra Malaysia's Student Information System and learning management system within the data lake environment. With its parallel processing capabilities, this centralized data repository facilitates the training and evaluation of various machine learning models for prediction. In addition to forecasting the student performance, appropriate machine learning algorithms such as Support Vector Classifier, Naive Bayes, and Decision Trees are used to build prediction models by using the data lake's scalability and parallel processing capabilities. This study has laid a solid groundwork for using data architecture to improve students’ performance.
first_indexed 2025-11-15T14:33:34Z
format Article
id upm-117456
institution Universiti Putra Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T14:33:34Z
publishDate 2024
publisher Insight Society
recordtype eprints
repository_type Digital Repository
spelling upm-1174562025-05-23T08:56:40Z http://psasir.upm.edu.my/id/eprint/117456/ Leveraging data lake architecture for predicting academic student performance Abdul Rahim, Shameen Aina Sidi, Fatimah Affendey, Lilly Suriani Ishak, Iskandar Nurlankyzy, Appak Yessirkep In today's rapidly evolving landscape of higher education, the effective management and analysis of academic data have become increasingly challenging, particularly in the context of the 3Vs of Big Data: volume, variety, and velocity. The amount of data produced by educational institutions has increased dramatically, including student records. This flood of data originates from various sources and takes several forms, such as learning management systems and student information systems. Hence, in education, data analytics and predictive modeling have become increasingly significant in acquiring insights into student performance, such as identifying at-risk students who are most likely to fail their courses. This study proposes a novel approach for predicting student academic performance, particularly identifying at-risk students, by leveraging a data lake architecture. The proposed methodology comprises the ingestion, transformation, and quality assessment of a combined data source from Universiti Putra Malaysia's Student Information System and learning management system within the data lake environment. With its parallel processing capabilities, this centralized data repository facilitates the training and evaluation of various machine learning models for prediction. In addition to forecasting the student performance, appropriate machine learning algorithms such as Support Vector Classifier, Naive Bayes, and Decision Trees are used to build prediction models by using the data lake's scalability and parallel processing capabilities. This study has laid a solid groundwork for using data architecture to improve students’ performance. Insight Society 2024-12-25 Article PeerReviewed text en cc_by_4 http://psasir.upm.edu.my/id/eprint/117456/1/117456.pdf Abdul Rahim, Shameen Aina and Sidi, Fatimah and Affendey, Lilly Suriani and Ishak, Iskandar and Nurlankyzy, Appak Yessirkep (2024) Leveraging data lake architecture for predicting academic student performance. International Journal on Advanced Science, Engineering and Information Technology, 14 (6). pp. 2121-2129. ISSN 2088-5334; eISSN: 2460-6952 https://ijaseit.insightsociety.org/index.php/ijaseit/article/view/12408 10.18517/ijaseit.14.6.12408
spellingShingle Abdul Rahim, Shameen Aina
Sidi, Fatimah
Affendey, Lilly Suriani
Ishak, Iskandar
Nurlankyzy, Appak Yessirkep
Leveraging data lake architecture for predicting academic student performance
title Leveraging data lake architecture for predicting academic student performance
title_full Leveraging data lake architecture for predicting academic student performance
title_fullStr Leveraging data lake architecture for predicting academic student performance
title_full_unstemmed Leveraging data lake architecture for predicting academic student performance
title_short Leveraging data lake architecture for predicting academic student performance
title_sort leveraging data lake architecture for predicting academic student performance
url http://psasir.upm.edu.my/id/eprint/117456/
http://psasir.upm.edu.my/id/eprint/117456/
http://psasir.upm.edu.my/id/eprint/117456/
http://psasir.upm.edu.my/id/eprint/117456/1/117456.pdf