Approximate Query Processing on High Dimensionality Database Tables Using Multidimensional Cluster Sampling View

Approximate query processing based on random sampling is one of the most useful methods for the efficient computation of large quantities of data kept in databases. However, small samples obtained through random sampling methods might lack the appropriate data relevant to query conditions because th...

Full description

Bibliographic Details
Main Authors: Inoue, T., Krishna, Aneesh, Gopalan, Raj
Format: Journal Article
Published: 2016
Online Access:http://hdl.handle.net/20.500.11937/25472
_version_ 1848751718647463936
author Inoue, T.
Krishna, Aneesh
Gopalan, Raj
author_facet Inoue, T.
Krishna, Aneesh
Gopalan, Raj
author_sort Inoue, T.
building Curtin Institutional Repository
collection Online Access
description Approximate query processing based on random sampling is one of the most useful methods for the efficient computation of large quantities of data kept in databases. However, small samples obtained through random sampling methods might lack the appropriate data relevant to query conditions because the samples do not adequately represent the entire dataset. The Multidimensional Cluster Sampling View has been proposed to support efficient and effective approximate query processing on common database tables. This view provides random sample records to be drawn from a database in SQL efficiently and effectively. The effectiveness of approximate query processing in this view was demonstrated on a large database table with only four dimensions. This differed from the usual number of dimensions in decision support systems, which is most commonly over ten. Therefore, further examinations and evaluations focusing on dimensionality, such as ten-dimensional data and over, are required in order to demonstrate its practicality. This paper evaluates whether the number of dimensions have an impact on the accuracy of the approximation and on the performance of the Multidimensional Cluster Sampling View. The results of the evaluation show that the effects of dimensionality are not visible.
first_indexed 2025-11-14T07:57:11Z
format Journal Article
id curtin-20.500.11937-25472
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T07:57:11Z
publishDate 2016
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-254722017-09-13T15:16:37Z Approximate Query Processing on High Dimensionality Database Tables Using Multidimensional Cluster Sampling View Inoue, T. Krishna, Aneesh Gopalan, Raj Approximate query processing based on random sampling is one of the most useful methods for the efficient computation of large quantities of data kept in databases. However, small samples obtained through random sampling methods might lack the appropriate data relevant to query conditions because the samples do not adequately represent the entire dataset. The Multidimensional Cluster Sampling View has been proposed to support efficient and effective approximate query processing on common database tables. This view provides random sample records to be drawn from a database in SQL efficiently and effectively. The effectiveness of approximate query processing in this view was demonstrated on a large database table with only four dimensions. This differed from the usual number of dimensions in decision support systems, which is most commonly over ten. Therefore, further examinations and evaluations focusing on dimensionality, such as ten-dimensional data and over, are required in order to demonstrate its practicality. This paper evaluates whether the number of dimensions have an impact on the accuracy of the approximation and on the performance of the Multidimensional Cluster Sampling View. The results of the evaluation show that the effects of dimensionality are not visible. 2016 Journal Article http://hdl.handle.net/20.500.11937/25472 10.17706/jsw.11.1.80-93 restricted
spellingShingle Inoue, T.
Krishna, Aneesh
Gopalan, Raj
Approximate Query Processing on High Dimensionality Database Tables Using Multidimensional Cluster Sampling View
title Approximate Query Processing on High Dimensionality Database Tables Using Multidimensional Cluster Sampling View
title_full Approximate Query Processing on High Dimensionality Database Tables Using Multidimensional Cluster Sampling View
title_fullStr Approximate Query Processing on High Dimensionality Database Tables Using Multidimensional Cluster Sampling View
title_full_unstemmed Approximate Query Processing on High Dimensionality Database Tables Using Multidimensional Cluster Sampling View
title_short Approximate Query Processing on High Dimensionality Database Tables Using Multidimensional Cluster Sampling View
title_sort approximate query processing on high dimensionality database tables using multidimensional cluster sampling view
url http://hdl.handle.net/20.500.11937/25472