An improved system for sentence-level novelty detection in textual streams

Novelty detection in news events has long been a difficult problem. A number of models performed well on specific data streams but certain issues are far from being solved, particularly in large data streams from the WWW where unpredictability of new terms requires adaptation in the vector space mod...

Full description

Bibliographic Details
Main Authors: Fu, Xinyu, Ch'ng, Eugene, Aickelin, Uwe
Format: Conference or Workshop Item
Published: 2016
Subjects:
Online Access:https://eprints.nottingham.ac.uk/30452/
_version_ 1848793988257021952
author Fu, Xinyu
Ch'ng, Eugene
Aickelin, Uwe
author_facet Fu, Xinyu
Ch'ng, Eugene
Aickelin, Uwe
author_sort Fu, Xinyu
building Nottingham Research Data Repository
collection Online Access
description Novelty detection in news events has long been a difficult problem. A number of models performed well on specific data streams but certain issues are far from being solved, particularly in large data streams from the WWW where unpredictability of new terms requires adaptation in the vector space model. We present a novel event detection system based on the Incremental Term Frequency-Inverse Document Frequency (TF-IDF) weighting incorporated with Locality Sensitive Hashing (LSH). Our system could efficiently and effectively adapt to the changes within the data streams of any new terms with continual updates to the vector space model. Regarding miss probability, our proposed novelty detection framework outperforms a recognised baseline system by approximately 16% when evaluating a benchmark dataset from Google News.
first_indexed 2025-11-14T19:09:02Z
format Conference or Workshop Item
id nottingham-30452
institution University of Nottingham Malaysia Campus
institution_category Local University
last_indexed 2025-11-14T19:09:02Z
publishDate 2016
recordtype eprints
repository_type Digital Repository
spelling nottingham-304522020-05-04T17:47:27Z https://eprints.nottingham.ac.uk/30452/ An improved system for sentence-level novelty detection in textual streams Fu, Xinyu Ch'ng, Eugene Aickelin, Uwe Novelty detection in news events has long been a difficult problem. A number of models performed well on specific data streams but certain issues are far from being solved, particularly in large data streams from the WWW where unpredictability of new terms requires adaptation in the vector space model. We present a novel event detection system based on the Incremental Term Frequency-Inverse Document Frequency (TF-IDF) weighting incorporated with Locality Sensitive Hashing (LSH). Our system could efficiently and effectively adapt to the changes within the data streams of any new terms with continual updates to the vector space model. Regarding miss probability, our proposed novelty detection framework outperforms a recognised baseline system by approximately 16% when evaluating a benchmark dataset from Google News. 2016-04-07 Conference or Workshop Item PeerReviewed Fu, Xinyu, Ch'ng, Eugene and Aickelin, Uwe (2016) An improved system for sentence-level novelty detection in textual streams. In: 3rd International Conference on Smart Sustainable City and Big Data (ICSSC), 27-28 July 2015, Shanghai, China. first story detection novelty detection Locality Sensitive Hashing text mining http://ieeexplore.ieee.org/document/7446433/ doi:10.1049/cp.2015.0250 doi:10.1049/cp.2015.0250
spellingShingle first story detection
novelty detection
Locality Sensitive Hashing
text mining
Fu, Xinyu
Ch'ng, Eugene
Aickelin, Uwe
An improved system for sentence-level novelty detection in textual streams
title An improved system for sentence-level novelty detection in textual streams
title_full An improved system for sentence-level novelty detection in textual streams
title_fullStr An improved system for sentence-level novelty detection in textual streams
title_full_unstemmed An improved system for sentence-level novelty detection in textual streams
title_short An improved system for sentence-level novelty detection in textual streams
title_sort improved system for sentence-level novelty detection in textual streams
topic first story detection
novelty detection
Locality Sensitive Hashing
text mining
url https://eprints.nottingham.ac.uk/30452/
https://eprints.nottingham.ac.uk/30452/
https://eprints.nottingham.ac.uk/30452/