A performance analysis of prediction techniques in handling high-dimensional uncertain data for the application of skyline query over data stream
The proliferation of high-dimensional data in many advanced database applications is a result of today’s technological advancements. These data points that correspond to objects are often without a precise description, which make their representation uncertain. While the concept of data streaming is...
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024
|
| Subjects: | |
| Online Access: | https://umpir.ump.edu.my/id/eprint/44181/ |
| _version_ | 1848827312563290112 |
|---|---|
| author | Mohamud, Mudathir Ahmed Hamidah D., Ibrahim Fatimah, Sidi Siti Nurulain, Mohd Rum Zarina, Dzolkhifli Xiaowei, Zhang |
| author_facet | Mohamud, Mudathir Ahmed Hamidah D., Ibrahim Fatimah, Sidi Siti Nurulain, Mohd Rum Zarina, Dzolkhifli Xiaowei, Zhang |
| author_sort | Mohamud, Mudathir Ahmed |
| building | UMP Institutional Repository |
| collection | Online Access |
| description | The proliferation of high-dimensional data in many advanced database applications is a result of today’s technological advancements. These data points that correspond to objects are often without a precise description, which make their representation uncertain. While the concept of data streaming is not new, its practical uses are only recently emerging. This research focuses on continuous range data—a type of uncertain data common in database applications—that do not have explicit representations of their exact values. Furthermore, the identification of skyline objects—one of the popular database applications— becomes more challenging when skylines are to be identified from a collection of continuously generated input data streams where objects might have imprecise values. This makes it imperative to determine which approach has the optimal accuracy for estimating or predicting the uncertain values and at the same time able to handle a massive streams of data that are continuously generated and analyze them almost instantly to provide accurate and timely responses. Given this, the following techniques are selected—Linear Regression (LR), k-Nearest Neighbour(k-NN), Random Forest (RF), Decision Trees(DT), and Centre and Range Method (CRM) and their effectiveness is evaluated in terms of execution time, precision, recall, F1-score, and root mean square error (RMSE). Additionally, in order to verify the accuracy of each prediction technique, the predicted data derived from its model is used to derive skyline objects, which are subsequently compared to the actual skyline results. An inaccurate prediction of a continuous range value would result in incorrect set of skyline objects. |
| first_indexed | 2025-11-15T03:58:43Z |
| format | Article |
| id | ump-44181 |
| institution | Universiti Malaysia Pahang |
| institution_category | Local University |
| language | English |
| last_indexed | 2025-11-15T03:58:43Z |
| publishDate | 2024 |
| publisher | IEEE |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | ump-441812025-08-06T00:55:48Z https://umpir.ump.edu.my/id/eprint/44181/ A performance analysis of prediction techniques in handling high-dimensional uncertain data for the application of skyline query over data stream Mohamud, Mudathir Ahmed Hamidah D., Ibrahim Fatimah, Sidi Siti Nurulain, Mohd Rum Zarina, Dzolkhifli Xiaowei, Zhang QA75 Electronic computers. Computer science The proliferation of high-dimensional data in many advanced database applications is a result of today’s technological advancements. These data points that correspond to objects are often without a precise description, which make their representation uncertain. While the concept of data streaming is not new, its practical uses are only recently emerging. This research focuses on continuous range data—a type of uncertain data common in database applications—that do not have explicit representations of their exact values. Furthermore, the identification of skyline objects—one of the popular database applications— becomes more challenging when skylines are to be identified from a collection of continuously generated input data streams where objects might have imprecise values. This makes it imperative to determine which approach has the optimal accuracy for estimating or predicting the uncertain values and at the same time able to handle a massive streams of data that are continuously generated and analyze them almost instantly to provide accurate and timely responses. Given this, the following techniques are selected—Linear Regression (LR), k-Nearest Neighbour(k-NN), Random Forest (RF), Decision Trees(DT), and Centre and Range Method (CRM) and their effectiveness is evaluated in terms of execution time, precision, recall, F1-score, and root mean square error (RMSE). Additionally, in order to verify the accuracy of each prediction technique, the predicted data derived from its model is used to derive skyline objects, which are subsequently compared to the actual skyline results. An inaccurate prediction of a continuous range value would result in incorrect set of skyline objects. IEEE 2024 Article PeerReviewed pdf en cc_by_nc_nd_4 https://umpir.ump.edu.my/id/eprint/44181/1/A%20performance%20analysis%20of%20prediction%20techniques.pdf Mohamud, Mudathir Ahmed and Hamidah D., Ibrahim and Fatimah, Sidi and Siti Nurulain, Mohd Rum and Zarina, Dzolkhifli and Xiaowei, Zhang (2024) A performance analysis of prediction techniques in handling high-dimensional uncertain data for the application of skyline query over data stream. IEEE Access, 12. pp. 120877-120898. ISSN 2169-3536. (Published) https://doi.org/10.1109/ACCESS.2024.3450863 https://doi.org/10.1109/ACCESS.2024.3450863 https://doi.org/10.1109/ACCESS.2024.3450863 |
| spellingShingle | QA75 Electronic computers. Computer science Mohamud, Mudathir Ahmed Hamidah D., Ibrahim Fatimah, Sidi Siti Nurulain, Mohd Rum Zarina, Dzolkhifli Xiaowei, Zhang A performance analysis of prediction techniques in handling high-dimensional uncertain data for the application of skyline query over data stream |
| title | A performance analysis of prediction techniques in handling high-dimensional uncertain data for the application of skyline query over data stream |
| title_full | A performance analysis of prediction techniques in handling high-dimensional uncertain data for the application of skyline query over data stream |
| title_fullStr | A performance analysis of prediction techniques in handling high-dimensional uncertain data for the application of skyline query over data stream |
| title_full_unstemmed | A performance analysis of prediction techniques in handling high-dimensional uncertain data for the application of skyline query over data stream |
| title_short | A performance analysis of prediction techniques in handling high-dimensional uncertain data for the application of skyline query over data stream |
| title_sort | performance analysis of prediction techniques in handling high-dimensional uncertain data for the application of skyline query over data stream |
| topic | QA75 Electronic computers. Computer science |
| url | https://umpir.ump.edu.my/id/eprint/44181/ https://umpir.ump.edu.my/id/eprint/44181/ https://umpir.ump.edu.my/id/eprint/44181/ |