A novel symbolization technique for time-series outlier detection

The detection of outliers in time series data is a core component of many data-mining applications and broadly applied in industrial applications. In large data sets algorithms that are efficient in both time and space are required. One area where speed and storage costs can be reduced is via symbol...

Full description

Bibliographic Details
Main Authors: Smith, Gavin, Goulding, James
Format: Conference or Workshop Item
Published: 2015
Subjects:
Online Access:https://eprints.nottingham.ac.uk/52309/
_version_ 1848798696128380928
author Smith, Gavin
Goulding, James
author_facet Smith, Gavin
Goulding, James
author_sort Smith, Gavin
building Nottingham Research Data Repository
collection Online Access
description The detection of outliers in time series data is a core component of many data-mining applications and broadly applied in industrial applications. In large data sets algorithms that are efficient in both time and space are required. One area where speed and storage costs can be reduced is via symbolization as a pre-processing step, additionally opening up the use of an array of discrete algorithms. With this common pre-processing step in mind, this work highlights that (1) existing symbolization approaches are designed to address problems other than outlier detection and are hence sub-optimal and (2) use of off-the-shelf symbolization techniques can therefore lead to significant unnecessary data corruption and potential performance loss when outlier detection is a key aspect of the data mining task at hand. Addressing this a novel symbolization method is motivated specifically targeting the end use application of outlier detection. The method is empirically shown to outperform existing approaches.
first_indexed 2025-11-14T20:23:52Z
format Conference or Workshop Item
id nottingham-52309
institution University of Nottingham Malaysia Campus
institution_category Local University
last_indexed 2025-11-14T20:23:52Z
publishDate 2015
recordtype eprints
repository_type Digital Repository
spelling nottingham-523092020-05-04T17:18:35Z https://eprints.nottingham.ac.uk/52309/ A novel symbolization technique for time-series outlier detection Smith, Gavin Goulding, James The detection of outliers in time series data is a core component of many data-mining applications and broadly applied in industrial applications. In large data sets algorithms that are efficient in both time and space are required. One area where speed and storage costs can be reduced is via symbolization as a pre-processing step, additionally opening up the use of an array of discrete algorithms. With this common pre-processing step in mind, this work highlights that (1) existing symbolization approaches are designed to address problems other than outlier detection and are hence sub-optimal and (2) use of off-the-shelf symbolization techniques can therefore lead to significant unnecessary data corruption and potential performance loss when outlier detection is a key aspect of the data mining task at hand. Addressing this a novel symbolization method is motivated specifically targeting the end use application of outlier detection. The method is empirically shown to outperform existing approaches. 2015-10-29 Conference or Workshop Item PeerReviewed Smith, Gavin and Goulding, James (2015) A novel symbolization technique for time-series outlier detection. In: 2015 IEEE International Conference on Big Data, Oct 29 - Nov 1 2015, Santa Clara, California, USA. Detection; Preprocessing; Symbolization; Quantization; Optimization; Time series; Data mining https://ieeexplore.ieee.org/document/7364037/ 10.1109/BigData.2015.7364037 10.1109/BigData.2015.7364037 10.1109/BigData.2015.7364037
spellingShingle Detection; Preprocessing; Symbolization; Quantization; Optimization; Time series; Data mining
Smith, Gavin
Goulding, James
A novel symbolization technique for time-series outlier detection
title A novel symbolization technique for time-series outlier detection
title_full A novel symbolization technique for time-series outlier detection
title_fullStr A novel symbolization technique for time-series outlier detection
title_full_unstemmed A novel symbolization technique for time-series outlier detection
title_short A novel symbolization technique for time-series outlier detection
title_sort novel symbolization technique for time-series outlier detection
topic Detection; Preprocessing; Symbolization; Quantization; Optimization; Time series; Data mining
url https://eprints.nottingham.ac.uk/52309/
https://eprints.nottingham.ac.uk/52309/
https://eprints.nottingham.ac.uk/52309/