Selecting adequate samples for approximate decision support queries

For highly selective queries, a simple random sample of records drawn from a large data warehouse may not contain sufficient number of records that satisfy the query conditions. Efficient sampling schemes for such queries require innovative techniques that can access records that are relevant to eac...

Full description

Bibliographic Details
Main Authors:	Rudra, Amit, Gopalan, Raj, Achuthan, Narasimaha
Other Authors:	Salimane Hammoudi
Format:	Conference Paper
Published:	Science and Technology Publications 2013
Subjects:	Data Warehousing Inverse Simple Random Sample without Replacement (SRSWOR) Approximate Query Processing Sampling
Online Access:	http://hdl.handle.net/20.500.11937/33606

_version_	1848753993681993728
author	Rudra, Amit Gopalan, Raj Achuthan, Narasimaha
author2	Salimane Hammoudi
author_facet	Salimane Hammoudi Rudra, Amit Gopalan, Raj Achuthan, Narasimaha
author_sort	Rudra, Amit
building	Curtin Institutional Repository
collection	Online Access
description	For highly selective queries, a simple random sample of records drawn from a large data warehouse may not contain sufficient number of records that satisfy the query conditions. Efficient sampling schemes for such queries require innovative techniques that can access records that are relevant to each specific query. In drawing the sample, it is advantageous to know what would be an adequate sample size for a given query. This paper proposes methods for picking adequate samples that ensure approximate query results with a desired level of accuracy. A special index based on a structure known as the k-MDI Tree is used to draw samples. An unbiased estimator named inverse simple random sampling without replacement is adapted to estimate adequate sample sizes for queries. The methods are evaluated experimentally on a large real life data set. The results of evaluation show that adequate sample sizes can be determined such that errors in outputs of most queries are wtihin the acceptable limit of 5%.
first_indexed	2025-11-14T08:33:21Z
format	Conference Paper
id	curtin-20.500.11937-33606
institution	Curtin University Malaysia
institution_category	Local University
last_indexed	2025-11-14T08:33:21Z
publishDate	2013
publisher	Science and Technology Publications
recordtype	eprints
repository_type	Digital Repository
spelling	curtin-20.500.11937-336062023-02-13T08:01:34Z Selecting adequate samples for approximate decision support queries Rudra, Amit Gopalan, Raj Achuthan, Narasimaha Salimane Hammoudi Leszek Maciaszek Jose Cordeiro Jan Dietz Data Warehousing Inverse Simple Random Sample without Replacement (SRSWOR) Approximate Query Processing Sampling For highly selective queries, a simple random sample of records drawn from a large data warehouse may not contain sufficient number of records that satisfy the query conditions. Efficient sampling schemes for such queries require innovative techniques that can access records that are relevant to each specific query. In drawing the sample, it is advantageous to know what would be an adequate sample size for a given query. This paper proposes methods for picking adequate samples that ensure approximate query results with a desired level of accuracy. A special index based on a structure known as the k-MDI Tree is used to draw samples. An unbiased estimator named inverse simple random sampling without replacement is adapted to estimate adequate sample sizes for queries. The methods are evaluated experimentally on a large real life data set. The results of evaluation show that adequate sample sizes can be determined such that errors in outputs of most queries are wtihin the acceptable limit of 5%. 2013 Conference Paper http://hdl.handle.net/20.500.11937/33606 Science and Technology Publications fulltext
spellingShingle	Data Warehousing Inverse Simple Random Sample without Replacement (SRSWOR) Approximate Query Processing Sampling Rudra, Amit Gopalan, Raj Achuthan, Narasimaha Selecting adequate samples for approximate decision support queries
title	Selecting adequate samples for approximate decision support queries
title_full	Selecting adequate samples for approximate decision support queries
title_fullStr	Selecting adequate samples for approximate decision support queries
title_full_unstemmed	Selecting adequate samples for approximate decision support queries
title_short	Selecting adequate samples for approximate decision support queries
title_sort	selecting adequate samples for approximate decision support queries
topic	Data Warehousing Inverse Simple Random Sample without Replacement (SRSWOR) Approximate Query Processing Sampling
url	http://hdl.handle.net/20.500.11937/33606

Selecting adequate samples for approximate decision support queries

Similar Items