An efficient sampling scheme for approximate processing of decision support queries

Decision support queries usually involve accessing enormous amount of data requiring significant retrieval time. Faster retrieval of query results can often save precious time for the decision maker. Pre-computation of materialised views and sampling are two ways of achieving significant speed up. H...

Full description

Bibliographic Details
Main Authors: Rudra, Amit, Gopalan, Raj, Achuthan, Narasimaha
Other Authors: José Cordeiro
Format: Conference Paper
Published: INSTICC 2012
Subjects:
Online Access:http://hdl.handle.net/20.500.11937/28648
_version_ 1848752592986832896
author Rudra, Amit
Gopalan, Raj
Achuthan, Narasimaha
author2 José Cordeiro
author_facet José Cordeiro
Rudra, Amit
Gopalan, Raj
Achuthan, Narasimaha
author_sort Rudra, Amit
building Curtin Institutional Repository
collection Online Access
description Decision support queries usually involve accessing enormous amount of data requiring significant retrieval time. Faster retrieval of query results can often save precious time for the decision maker. Pre-computation of materialised views and sampling are two ways of achieving significant speed up. However, drawing random samples for queries on range restricted attributes has two problems: small random samples may miss relevant records and drawing larger samples from disk can be inefficient due to the large number of disk accesses required. In this paper, we propose an efficient indexing scheme for quickly drawing relevant samples for data warehouse queries as well as propose the concepts of database and sample relevancy ratios. We describe a method for estimating query results for range restricted queries using this index and experimentally evaluate the scheme using a relatively large real dataset. Further, we compute the confidence intervals for the estimates to investigate whether the results can be guaranteed to be within the desired level of confidence. Our experiments on data from a retail data warehouse show promising results. We also report the levels of accuracy achieved for various types of aggregate queries and relate them to the database relevancy ratios of the queries.
first_indexed 2025-11-14T08:11:05Z
format Conference Paper
id curtin-20.500.11937-28648
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T08:11:05Z
publishDate 2012
publisher INSTICC
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-286482023-02-07T08:01:18Z An efficient sampling scheme for approximate processing of decision support queries Rudra, Amit Gopalan, Raj Achuthan, Narasimaha José Cordeiro Leszek Maciaszek Alfredo Cuzzocrea Data Warehousing Approximate Query Processing Sampling Decision support queries usually involve accessing enormous amount of data requiring significant retrieval time. Faster retrieval of query results can often save precious time for the decision maker. Pre-computation of materialised views and sampling are two ways of achieving significant speed up. However, drawing random samples for queries on range restricted attributes has two problems: small random samples may miss relevant records and drawing larger samples from disk can be inefficient due to the large number of disk accesses required. In this paper, we propose an efficient indexing scheme for quickly drawing relevant samples for data warehouse queries as well as propose the concepts of database and sample relevancy ratios. We describe a method for estimating query results for range restricted queries using this index and experimentally evaluate the scheme using a relatively large real dataset. Further, we compute the confidence intervals for the estimates to investigate whether the results can be guaranteed to be within the desired level of confidence. Our experiments on data from a retail data warehouse show promising results. We also report the levels of accuracy achieved for various types of aggregate queries and relate them to the database relevancy ratios of the queries. 2012 Conference Paper http://hdl.handle.net/20.500.11937/28648 INSTICC fulltext
spellingShingle Data Warehousing
Approximate Query Processing
Sampling
Rudra, Amit
Gopalan, Raj
Achuthan, Narasimaha
An efficient sampling scheme for approximate processing of decision support queries
title An efficient sampling scheme for approximate processing of decision support queries
title_full An efficient sampling scheme for approximate processing of decision support queries
title_fullStr An efficient sampling scheme for approximate processing of decision support queries
title_full_unstemmed An efficient sampling scheme for approximate processing of decision support queries
title_short An efficient sampling scheme for approximate processing of decision support queries
title_sort efficient sampling scheme for approximate processing of decision support queries
topic Data Warehousing
Approximate Query Processing
Sampling
url http://hdl.handle.net/20.500.11937/28648