Picking adequate samples for approximate decision support queries using inverse SRSWOR

A simple random sample of records from a large data warehouse may not contain sufficient number of records that satisfy highly selective queries. Efficient sampling schemes for such queries involve using innovative techniques that can access records that are relevant to specific queries. In drawing...

Full description

Bibliographic Details
Main Authors: Rudra, Amit, Gopalan, Raj, Achuthan, Narasimaha
Other Authors: Ford Lumban Gaol
Format: Conference Paper
Published: IJISCA 2012
Subjects:
Online Access:http://hdl.handle.net/20.500.11937/16253
_version_ 1848749122961539072
author Rudra, Amit
Gopalan, Raj
Achuthan, Narasimaha
author2 Ford Lumban Gaol
author_facet Ford Lumban Gaol
Rudra, Amit
Gopalan, Raj
Achuthan, Narasimaha
author_sort Rudra, Amit
building Curtin Institutional Repository
collection Online Access
description A simple random sample of records from a large data warehouse may not contain sufficient number of records that satisfy highly selective queries. Efficient sampling schemes for such queries involve using innovative techniques that can access records that are relevant to specific queries. In drawing the sample, it is advantageous to know what would be an adequate sample size for a given query. This paper proposes methods for picking adequate samples that ensure approximate query results with a desired level of accuracy. A special index based on a structure known as the k-MDI Tree is used to draw samples. An unbiased estimator named inverse simple random sampling without replacement is adapted to estimate adequate sample sizes for queries. The methods are evaluated experimentally on a large real life data set. The results of evaluation show that adequate sample sizes can be determined with errors in outputs of most queries within the acceptable limit of 5%.
first_indexed 2025-11-14T07:15:55Z
format Conference Paper
id curtin-20.500.11937-16253
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T07:15:55Z
publishDate 2012
publisher IJISCA
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-162532017-01-30T11:54:48Z Picking adequate samples for approximate decision support queries using inverse SRSWOR Rudra, Amit Gopalan, Raj Achuthan, Narasimaha Ford Lumban Gaol sampling data warehousing approximate query processing A simple random sample of records from a large data warehouse may not contain sufficient number of records that satisfy highly selective queries. Efficient sampling schemes for such queries involve using innovative techniques that can access records that are relevant to specific queries. In drawing the sample, it is advantageous to know what would be an adequate sample size for a given query. This paper proposes methods for picking adequate samples that ensure approximate query results with a desired level of accuracy. A special index based on a structure known as the k-MDI Tree is used to draw samples. An unbiased estimator named inverse simple random sampling without replacement is adapted to estimate adequate sample sizes for queries. The methods are evaluated experimentally on a large real life data set. The results of evaluation show that adequate sample sizes can be determined with errors in outputs of most queries within the acceptable limit of 5%. 2012 Conference Paper http://hdl.handle.net/20.500.11937/16253 IJISCA restricted
spellingShingle sampling
data warehousing
approximate query processing
Rudra, Amit
Gopalan, Raj
Achuthan, Narasimaha
Picking adequate samples for approximate decision support queries using inverse SRSWOR
title Picking adequate samples for approximate decision support queries using inverse SRSWOR
title_full Picking adequate samples for approximate decision support queries using inverse SRSWOR
title_fullStr Picking adequate samples for approximate decision support queries using inverse SRSWOR
title_full_unstemmed Picking adequate samples for approximate decision support queries using inverse SRSWOR
title_short Picking adequate samples for approximate decision support queries using inverse SRSWOR
title_sort picking adequate samples for approximate decision support queries using inverse srswor
topic sampling
data warehousing
approximate query processing
url http://hdl.handle.net/20.500.11937/16253