Towards optimal symbolization for time series comparisons

The abundance and value of mining large time series data sets has long been acknowledged. Ubiquitous in fields ranging from astronomy, biology and web science the size and number of these datasets continues to increase, a situation exacerbated by the exponential growth of our digital footprints. The...

Full description

Bibliographic Details
Main Authors:	Smith, Gavin, Goulding, James, Barrack, Duncan
Format:	Conference or Workshop Item
Published:	2013
Subjects:	Time series analysis; Quantization (signal); Equations; Mathematical model; Data mining; Approximation methods; Simulated annealing
Online Access:	https://eprints.nottingham.ac.uk/52220/

_version_	1848798676086947840
author	Smith, Gavin Goulding, James Barrack, Duncan
author_facet	Smith, Gavin Goulding, James Barrack, Duncan
author_sort	Smith, Gavin
building	Nottingham Research Data Repository
collection	Online Access
description	The abundance and value of mining large time series data sets has long been acknowledged. Ubiquitous in fields ranging from astronomy, biology and web science the size and number of these datasets continues to increase, a situation exacerbated by the exponential growth of our digital footprints. The prevalence and potential utility of this data has led to a vast number of time-series data mining techniques, many of which require symbolization of the raw time series as a pre-processing step for which a number of well used, pre-existing approaches from the literature are typically employed. In this work we note that these standard approaches are sub-optimal in (at least) the broad application area of time series comparison leading to unnecessary data corruption and potential performance loss before any real data mining takes place. Addressing this we present a novel quantizer based upon optimization of comparison fidelity and a computationally tractable algorithm for its implementation on big datasets. We demonstrate empirically that our new approach provides a statistically significant reduction in the amount of error introduced by the symbolization process compared to current state-of-the-art. The approach therefore provides a more accurate input for the vast number of data mining techniques in the literature, providing the potential of increased real world performance across a wide range of existing data mining algorithms and applications.
first_indexed	2025-11-14T20:23:33Z
format	Conference or Workshop Item
id	nottingham-52220
institution	University of Nottingham Malaysia Campus
institution_category	Local University
last_indexed	2025-11-14T20:23:33Z
publishDate	2013
recordtype	eprints
repository_type	Digital Repository
spelling	nottingham-522202020-05-04T16:40:43Z https://eprints.nottingham.ac.uk/52220/ Towards optimal symbolization for time series comparisons Smith, Gavin Goulding, James Barrack, Duncan The abundance and value of mining large time series data sets has long been acknowledged. Ubiquitous in fields ranging from astronomy, biology and web science the size and number of these datasets continues to increase, a situation exacerbated by the exponential growth of our digital footprints. The prevalence and potential utility of this data has led to a vast number of time-series data mining techniques, many of which require symbolization of the raw time series as a pre-processing step for which a number of well used, pre-existing approaches from the literature are typically employed. In this work we note that these standard approaches are sub-optimal in (at least) the broad application area of time series comparison leading to unnecessary data corruption and potential performance loss before any real data mining takes place. Addressing this we present a novel quantizer based upon optimization of comparison fidelity and a computationally tractable algorithm for its implementation on big datasets. We demonstrate empirically that our new approach provides a statistically significant reduction in the amount of error introduced by the symbolization process compared to current state-of-the-art. The approach therefore provides a more accurate input for the vast number of data mining techniques in the literature, providing the potential of increased real world performance across a wide range of existing data mining algorithms and applications. 2013-12-07 Conference or Workshop Item PeerReviewed Smith, Gavin, Goulding, James and Barrack, Duncan (2013) Towards optimal symbolization for time series comparisons. In: IEEE 13th International Conference on Data Mining Workshops (ICDMW 2013), 7-10 Dec 2013, Dallas, Texas, USA. Time series analysis; Quantization (signal); Equations; Mathematical model; Data mining; Approximation methods; Simulated annealing https://doi.org/10.1109/ICDMW.2013.59 10.1109/ICDMW.2013.59 10.1109/ICDMW.2013.59 10.1109/ICDMW.2013.59
spellingShingle	Time series analysis; Quantization (signal); Equations; Mathematical model; Data mining; Approximation methods; Simulated annealing Smith, Gavin Goulding, James Barrack, Duncan Towards optimal symbolization for time series comparisons
title	Towards optimal symbolization for time series comparisons
title_full	Towards optimal symbolization for time series comparisons
title_fullStr	Towards optimal symbolization for time series comparisons
title_full_unstemmed	Towards optimal symbolization for time series comparisons
title_short	Towards optimal symbolization for time series comparisons
title_sort	towards optimal symbolization for time series comparisons
topic	Time series analysis; Quantization (signal); Equations; Mathematical model; Data mining; Approximation methods; Simulated annealing
url	https://eprints.nottingham.ac.uk/52220/ https://eprints.nottingham.ac.uk/52220/ https://eprints.nottingham.ac.uk/52220/

Towards optimal symbolization for time series comparisons

Similar Items