Efficient management of Top-k queries over Uncertain Data Streams with dynamic Sliding Window Model
Today, the advancement of information technology has led to a growing need for continuous processing of significant events, such as enhanced methods for monitoring road speed and mobile computing. The Uncertain Data Stream (UDS) utilized for query processing can provide challenges in many technic...
| Main Author: | |
|---|---|
| Format: | Thesis |
| Language: | English |
| Published: |
2024
|
| Subjects: | |
| Online Access: | http://psasir.upm.edu.my/id/eprint/120035/ http://psasir.upm.edu.my/id/eprint/120035/1/120035.pdf |
| Summary: | Today, the advancement of information technology has led to a growing need
for continuous processing of significant events, such as enhanced methods for
monitoring road speed and mobile computing. The Uncertain Data Stream
(UDS) utilized for query processing can provide challenges in many technical
contexts owing to its inherent inconsistency, ambiguity, and time delay in
interpreting information. The large amount of data generated and frequent
changes in a short time make conventional processing methods insufficient.
The main issues are minimizing redundant scans of the whole data set,
improving uncertainty computation, and only processing the most recent tuple
items. In UDS, the number of possible world instances grows exponentially,
and understanding what is required to achieve Top-k query processing in the
shortest possible time can be extremely challenging. However, there is a need
to increase the number of studies investigating the issue of UDS using the
Sliding Window Model (SWM). An inefficient approach to processing
continuous queries on UDS with uncertainty over the SWM increased the
complexity of semantic trade-offs between answering maximum probability
and high-scoring result sets. Current research on tackling uncertainty revolves
around creating specifically tailored algorithms that can operate in the
presence of value uncertainty using both a count-based and a time-based
approach. This study aims to propose a framework for processing Top-k
queries in UDS, where the focus is on leveraging the efficiency of the SWM,
achieved through the SWMTop-kDelta algorithm. After establishing this
model's rules and probability theory, a method was designed to support the
Top-k processing algorithm over the SWM until the Top-k potential candidates
expired. This study also provides an overview of an improved optimization
method for tackling computational redundancy in the context of SWM and Topk
query computation. This method reduces computational costs by efficiently
handling the insertion and exit policy for the appropriate tuple candidates within
a specified window frame. The experiments in this study compare the
SWMTop-kDelta algorithm with two previous researchers and two baseline
approach algorithms to evaluate their effectiveness. The algorithm
development combines the frameworks from Phases 1 to 3, evaluating real
and synthetic datasets. It assesses efficiency by comparing the number of
possible worlds and processing times. The experiment was conducted in
triplicate and recorded the mean value of these iterations. As the data set size
increases, SWMTop-kDelta consistently performs well, regardless of the data
set size and the measurement of the number parameter k. Even if the initial
improvement is only slight, performance can consistently improve by making
certain adjustments, such as increasing the number of window segmentations,
decreasing the window size, reducing the number of queries, and adjusting the
probability threshold (d) more frequently. It demonstrates a significant
improvement of 30%–90% compared to other methods, thanks to its consistent
performance and strong scalability. This study effort will make a valuable
contribution to the field of Top-k computational query processing. |
|---|