Selecting adequate samples for approximate decision support queries

For highly selective queries, a simple random sample of records drawn from a large data warehouse may not contain sufficient number of records that satisfy the query conditions. Efficient sampling schemes for such queries require innovative techniques that can access records that are relevant to eac...

Full description

Bibliographic Details
Main Authors: Rudra, Amit, Gopalan, Raj, Achuthan, Narasimaha
Other Authors: Salimane Hammoudi
Format: Conference Paper
Published: Science and Technology Publications 2013
Subjects:
Online Access:http://hdl.handle.net/20.500.11937/33606
_version_ 1848753993681993728
author Rudra, Amit
Gopalan, Raj
Achuthan, Narasimaha
author2 Salimane Hammoudi
author_facet Salimane Hammoudi
Rudra, Amit
Gopalan, Raj
Achuthan, Narasimaha
author_sort Rudra, Amit
building Curtin Institutional Repository
collection Online Access
description For highly selective queries, a simple random sample of records drawn from a large data warehouse may not contain sufficient number of records that satisfy the query conditions. Efficient sampling schemes for such queries require innovative techniques that can access records that are relevant to each specific query. In drawing the sample, it is advantageous to know what would be an adequate sample size for a given query. This paper proposes methods for picking adequate samples that ensure approximate query results with a desired level of accuracy. A special index based on a structure known as the k-MDI Tree is used to draw samples. An unbiased estimator named inverse simple random sampling without replacement is adapted to estimate adequate sample sizes for queries. The methods are evaluated experimentally on a large real life data set. The results of evaluation show that adequate sample sizes can be determined such that errors in outputs of most queries are wtihin the acceptable limit of 5%.
first_indexed 2025-11-14T08:33:21Z
format Conference Paper
id curtin-20.500.11937-33606
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T08:33:21Z
publishDate 2013
publisher Science and Technology Publications
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-336062023-02-13T08:01:34Z Selecting adequate samples for approximate decision support queries Rudra, Amit Gopalan, Raj Achuthan, Narasimaha Salimane Hammoudi Leszek Maciaszek Jose Cordeiro Jan Dietz Data Warehousing Inverse Simple Random Sample without Replacement (SRSWOR) Approximate Query Processing Sampling For highly selective queries, a simple random sample of records drawn from a large data warehouse may not contain sufficient number of records that satisfy the query conditions. Efficient sampling schemes for such queries require innovative techniques that can access records that are relevant to each specific query. In drawing the sample, it is advantageous to know what would be an adequate sample size for a given query. This paper proposes methods for picking adequate samples that ensure approximate query results with a desired level of accuracy. A special index based on a structure known as the k-MDI Tree is used to draw samples. An unbiased estimator named inverse simple random sampling without replacement is adapted to estimate adequate sample sizes for queries. The methods are evaluated experimentally on a large real life data set. The results of evaluation show that adequate sample sizes can be determined such that errors in outputs of most queries are wtihin the acceptable limit of 5%. 2013 Conference Paper http://hdl.handle.net/20.500.11937/33606 Science and Technology Publications fulltext
spellingShingle Data Warehousing
Inverse Simple Random Sample without Replacement (SRSWOR)
Approximate Query Processing
Sampling
Rudra, Amit
Gopalan, Raj
Achuthan, Narasimaha
Selecting adequate samples for approximate decision support queries
title Selecting adequate samples for approximate decision support queries
title_full Selecting adequate samples for approximate decision support queries
title_fullStr Selecting adequate samples for approximate decision support queries
title_full_unstemmed Selecting adequate samples for approximate decision support queries
title_short Selecting adequate samples for approximate decision support queries
title_sort selecting adequate samples for approximate decision support queries
topic Data Warehousing
Inverse Simple Random Sample without Replacement (SRSWOR)
Approximate Query Processing
Sampling
url http://hdl.handle.net/20.500.11937/33606