Efficient management of Top-k queries over Uncertain Data Streams with dynamic Sliding Window Model

Today, the advancement of information technology has led to a growing need for continuous processing of significant events, such as enhanced methods for monitoring road speed and mobile computing. The Uncertain Data Stream (UDS) utilized for query processing can provide challenges in many technic...

Full description

Bibliographic Details
Main Author: Raja Wahab, Raja Azhan Syah
Format: Thesis
Language:English
Published: 2024
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/120035/
http://psasir.upm.edu.my/id/eprint/120035/1/120035.pdf
Description
Summary:Today, the advancement of information technology has led to a growing need for continuous processing of significant events, such as enhanced methods for monitoring road speed and mobile computing. The Uncertain Data Stream (UDS) utilized for query processing can provide challenges in many technical contexts owing to its inherent inconsistency, ambiguity, and time delay in interpreting information. The large amount of data generated and frequent changes in a short time make conventional processing methods insufficient. The main issues are minimizing redundant scans of the whole data set, improving uncertainty computation, and only processing the most recent tuple items. In UDS, the number of possible world instances grows exponentially, and understanding what is required to achieve Top-k query processing in the shortest possible time can be extremely challenging. However, there is a need to increase the number of studies investigating the issue of UDS using the Sliding Window Model (SWM). An inefficient approach to processing continuous queries on UDS with uncertainty over the SWM increased the complexity of semantic trade-offs between answering maximum probability and high-scoring result sets. Current research on tackling uncertainty revolves around creating specifically tailored algorithms that can operate in the presence of value uncertainty using both a count-based and a time-based approach. This study aims to propose a framework for processing Top-k queries in UDS, where the focus is on leveraging the efficiency of the SWM, achieved through the SWMTop-kDelta algorithm. After establishing this model's rules and probability theory, a method was designed to support the Top-k processing algorithm over the SWM until the Top-k potential candidates expired. This study also provides an overview of an improved optimization method for tackling computational redundancy in the context of SWM and Topk query computation. This method reduces computational costs by efficiently handling the insertion and exit policy for the appropriate tuple candidates within a specified window frame. The experiments in this study compare the SWMTop-kDelta algorithm with two previous researchers and two baseline approach algorithms to evaluate their effectiveness. The algorithm development combines the frameworks from Phases 1 to 3, evaluating real and synthetic datasets. It assesses efficiency by comparing the number of possible worlds and processing times. The experiment was conducted in triplicate and recorded the mean value of these iterations. As the data set size increases, SWMTop-kDelta consistently performs well, regardless of the data set size and the measurement of the number parameter k. Even if the initial improvement is only slight, performance can consistently improve by making certain adjustments, such as increasing the number of window segmentations, decreasing the window size, reducing the number of queries, and adjusting the probability threshold (d) more frequently. It demonstrates a significant improvement of 30%–90% compared to other methods, thanks to its consistent performance and strong scalability. This study effort will make a valuable contribution to the field of Top-k computational query processing.