A performance analysis of prediction techniques in handling high-dimensional uncertain data for the application of skyline query over data stream

The proliferation of high-dimensional data in many advanced database applications is a result of today's technological advancements. These data points that correspond to objects are often without a precise description, which make their representation uncertain. While the concept of data streami...

Full description

Bibliographic Details
Main Authors: Ahmed Mohamud, Mudathir, Ibrahim, Hamidah, Sidi, Fatimah, Mohd Rum, Siti Nurulain, Dzolkhifli, Zarina, Xiaowei, Zhang
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers Inc. 2024
Online Access:http://psasir.upm.edu.my/id/eprint/114855/
http://psasir.upm.edu.my/id/eprint/114855/1/114855.pdf
_version_ 1848866615661166592
author Ahmed Mohamud, Mudathir
Ibrahim, Hamidah
Sidi, Fatimah
Mohd Rum, Siti Nurulain
Dzolkhifli, Zarina
Xiaowei, Zhang
author_facet Ahmed Mohamud, Mudathir
Ibrahim, Hamidah
Sidi, Fatimah
Mohd Rum, Siti Nurulain
Dzolkhifli, Zarina
Xiaowei, Zhang
author_sort Ahmed Mohamud, Mudathir
building UPM Institutional Repository
collection Online Access
description The proliferation of high-dimensional data in many advanced database applications is a result of today's technological advancements. These data points that correspond to objects are often without a precise description, which make their representation uncertain. While the concept of data streaming is not new, its practical uses are only recently emerging. This research focuses on continuous range data - a type of uncertain data common in database applications - that do not have explicit representations of their exact values. Furthermore, the identification of skyline objects - one of the popular database applications - becomes more challenging when skylines are to be identified from a collection of continuously generated input data streams where objects might have imprecise values. This makes it imperative to determine which approach has the optimal accuracy for estimating or predicting the uncertain values and at the same time able to handle a massive streams of data that are continuously generated and analyze them almost instantly to provide accurate and timely responses. Given this, the following techniques are selected - Linear Regression (LR), k-Nearest Neighbour (k-NN), Random Forest (RF), Decision Trees (DT), and Centre and Range Method (CRM) and their effectiveness is evaluated in terms of execution time, precision, recall, F1-score, and root mean square error (RMSE). Additionally, in order to verify the accuracy of each prediction technique, the predicted data derived from its model is used to derive skyline objects, which are subsequently compared to the actual skyline results. An inaccurate prediction of a continuous range value would result in incorrect set of skyline objects.
first_indexed 2025-11-15T14:23:25Z
format Article
id upm-114855
institution Universiti Putra Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T14:23:25Z
publishDate 2024
publisher Institute of Electrical and Electronics Engineers Inc.
recordtype eprints
repository_type Digital Repository
spelling upm-1148552025-02-05T02:16:55Z http://psasir.upm.edu.my/id/eprint/114855/ A performance analysis of prediction techniques in handling high-dimensional uncertain data for the application of skyline query over data stream Ahmed Mohamud, Mudathir Ibrahim, Hamidah Sidi, Fatimah Mohd Rum, Siti Nurulain Dzolkhifli, Zarina Xiaowei, Zhang The proliferation of high-dimensional data in many advanced database applications is a result of today's technological advancements. These data points that correspond to objects are often without a precise description, which make their representation uncertain. While the concept of data streaming is not new, its practical uses are only recently emerging. This research focuses on continuous range data - a type of uncertain data common in database applications - that do not have explicit representations of their exact values. Furthermore, the identification of skyline objects - one of the popular database applications - becomes more challenging when skylines are to be identified from a collection of continuously generated input data streams where objects might have imprecise values. This makes it imperative to determine which approach has the optimal accuracy for estimating or predicting the uncertain values and at the same time able to handle a massive streams of data that are continuously generated and analyze them almost instantly to provide accurate and timely responses. Given this, the following techniques are selected - Linear Regression (LR), k-Nearest Neighbour (k-NN), Random Forest (RF), Decision Trees (DT), and Centre and Range Method (CRM) and their effectiveness is evaluated in terms of execution time, precision, recall, F1-score, and root mean square error (RMSE). Additionally, in order to verify the accuracy of each prediction technique, the predicted data derived from its model is used to derive skyline objects, which are subsequently compared to the actual skyline results. An inaccurate prediction of a continuous range value would result in incorrect set of skyline objects. Institute of Electrical and Electronics Engineers Inc. 2024-08-28 Article PeerReviewed text en cc_by_nc_nd_4 http://psasir.upm.edu.my/id/eprint/114855/1/114855.pdf Ahmed Mohamud, Mudathir and Ibrahim, Hamidah and Sidi, Fatimah and Mohd Rum, Siti Nurulain and Dzolkhifli, Zarina and Xiaowei, Zhang (2024) A performance analysis of prediction techniques in handling high-dimensional uncertain data for the application of skyline query over data stream. IEEE Access, 12. pp. 120877-120898. ISSN 2169-3536 https://ieeexplore.ieee.org/document/10654264/ 10.1109/ACCESS.2024.3450863
spellingShingle Ahmed Mohamud, Mudathir
Ibrahim, Hamidah
Sidi, Fatimah
Mohd Rum, Siti Nurulain
Dzolkhifli, Zarina
Xiaowei, Zhang
A performance analysis of prediction techniques in handling high-dimensional uncertain data for the application of skyline query over data stream
title A performance analysis of prediction techniques in handling high-dimensional uncertain data for the application of skyline query over data stream
title_full A performance analysis of prediction techniques in handling high-dimensional uncertain data for the application of skyline query over data stream
title_fullStr A performance analysis of prediction techniques in handling high-dimensional uncertain data for the application of skyline query over data stream
title_full_unstemmed A performance analysis of prediction techniques in handling high-dimensional uncertain data for the application of skyline query over data stream
title_short A performance analysis of prediction techniques in handling high-dimensional uncertain data for the application of skyline query over data stream
title_sort performance analysis of prediction techniques in handling high-dimensional uncertain data for the application of skyline query over data stream
url http://psasir.upm.edu.my/id/eprint/114855/
http://psasir.upm.edu.my/id/eprint/114855/
http://psasir.upm.edu.my/id/eprint/114855/
http://psasir.upm.edu.my/id/eprint/114855/1/114855.pdf