Anomaly detection in large-scale data stream networks

This paper addresses the anomaly detection problem in large-scale data mining applications using residual subspace analysis. We are specifically concerned with situations where the full data cannot be practically obtained due to physical limitations such as low bandwidth, limited memory, storage, or...

Full description

Bibliographic Details
Main Authors: Pham, DucSon, Venkatesh, S., Lazarescu, Mihai, Budhaditya, S.
Format: Journal Article
Published: Springer 2014
Subjects:
Online Access:http://hdl.handle.net/20.500.11937/43829
_version_ 1848756820627161088
author Pham, DucSon
Venkatesh, S.
Lazarescu, Mihai
Budhaditya, S.
author_facet Pham, DucSon
Venkatesh, S.
Lazarescu, Mihai
Budhaditya, S.
author_sort Pham, DucSon
building Curtin Institutional Repository
collection Online Access
description This paper addresses the anomaly detection problem in large-scale data mining applications using residual subspace analysis. We are specifically concerned with situations where the full data cannot be practically obtained due to physical limitations such as low bandwidth, limited memory, storage, or computing power. Motivated by the recent compressed sensing (CS) theory, we suggest a framework wherein random projection can be used to obtained compressed data, addressing the scalability challenge. Our theoretical contribution shows that the spectral property of the CS data is approximately preserved under a such a projection and thus the performance of spectral-based methods for anomaly detection is almost equivalent to the case in which the raw data is completely available. Our second contribution is the construction of the framework to use this result and detect anomalies in the compressed data directly, thus circumventing the problems of data acquisition in large sensor networks. We have conducted extensive experiments to detect anomalies in network and surveillance applications on large datasets, including the benchmark PETS 2007 and 83 GB of real footage from three public train stations. Our results show that our proposed method is scalable, and importantly, its performance is comparable to conventional methods for anomaly detection when the complete data is available.
first_indexed 2025-11-14T09:18:17Z
format Journal Article
id curtin-20.500.11937-43829
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T09:18:17Z
publishDate 2014
publisher Springer
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-438292019-02-19T05:35:06Z Anomaly detection in large-scale data stream networks Pham, DucSon Venkatesh, S. Lazarescu, Mihai Budhaditya, S. anomaly detection sensor network data spectral methods compressed sensing stream data processing residual subspace analysis random projection This paper addresses the anomaly detection problem in large-scale data mining applications using residual subspace analysis. We are specifically concerned with situations where the full data cannot be practically obtained due to physical limitations such as low bandwidth, limited memory, storage, or computing power. Motivated by the recent compressed sensing (CS) theory, we suggest a framework wherein random projection can be used to obtained compressed data, addressing the scalability challenge. Our theoretical contribution shows that the spectral property of the CS data is approximately preserved under a such a projection and thus the performance of spectral-based methods for anomaly detection is almost equivalent to the case in which the raw data is completely available. Our second contribution is the construction of the framework to use this result and detect anomalies in the compressed data directly, thus circumventing the problems of data acquisition in large sensor networks. We have conducted extensive experiments to detect anomalies in network and surveillance applications on large datasets, including the benchmark PETS 2007 and 83 GB of real footage from three public train stations. Our results show that our proposed method is scalable, and importantly, its performance is comparable to conventional methods for anomaly detection when the complete data is available. 2014 Journal Article http://hdl.handle.net/20.500.11937/43829 10.1007/s10618-012-0297-3 Springer fulltext
spellingShingle anomaly detection
sensor network data
spectral methods
compressed sensing
stream data processing
residual subspace analysis
random projection
Pham, DucSon
Venkatesh, S.
Lazarescu, Mihai
Budhaditya, S.
Anomaly detection in large-scale data stream networks
title Anomaly detection in large-scale data stream networks
title_full Anomaly detection in large-scale data stream networks
title_fullStr Anomaly detection in large-scale data stream networks
title_full_unstemmed Anomaly detection in large-scale data stream networks
title_short Anomaly detection in large-scale data stream networks
title_sort anomaly detection in large-scale data stream networks
topic anomaly detection
sensor network data
spectral methods
compressed sensing
stream data processing
residual subspace analysis
random projection
url http://hdl.handle.net/20.500.11937/43829