Provenance network analytics: an approach to data analytics using data provenance

Provenance network analytics is a novel data analytics approach that helps infer properties of data, such as quality or importance, from their provenance. Instead of analysing application data, which are typically domain-dependent, it analyses the data's provenance as represented using the Worl...

Full description

Bibliographic Details
Main Authors: Huynh, Trung Dong, Ebden, Mark, Fischer, Joel E., Roberts, Stephen, Moreau, Luc
Format: Article
Published: Springer 2018
Subjects:
Online Access:https://eprints.nottingham.ac.uk/48901/
_version_ 1848797874462130176
author Huynh, Trung Dong
Ebden, Mark
Fischer, Joel E.
Roberts, Stephen
Moreau, Luc
author_facet Huynh, Trung Dong
Ebden, Mark
Fischer, Joel E.
Roberts, Stephen
Moreau, Luc
author_sort Huynh, Trung Dong
building Nottingham Research Data Repository
collection Online Access
description Provenance network analytics is a novel data analytics approach that helps infer properties of data, such as quality or importance, from their provenance. Instead of analysing application data, which are typically domain-dependent, it analyses the data's provenance as represented using the World Wide Web Consortium's domain-agnostic PROV data model. Specifically, the approach proposes a number of network metrics for provenance data and applies established machine learning techniques over such metrics to build predictive models for some key properties of data. Applying this method to the provenance of real-world data from three different applications, we show that it can successfully identify the owners of provenance documents, assess the quality of crowdsourced data, and identify instructions from chat messages in an alternate-reality game with high levels of accuracy. By so doing, we demonstrate the different ways the proposed provenance network metrics can be used in analysing data, providing the foundation for provenance-based data analytics.
first_indexed 2025-11-14T20:10:49Z
format Article
id nottingham-48901
institution University of Nottingham Malaysia Campus
institution_category Local University
last_indexed 2025-11-14T20:10:49Z
publishDate 2018
publisher Springer
recordtype eprints
repository_type Digital Repository
spelling nottingham-489012020-05-04T19:32:31Z https://eprints.nottingham.ac.uk/48901/ Provenance network analytics: an approach to data analytics using data provenance Huynh, Trung Dong Ebden, Mark Fischer, Joel E. Roberts, Stephen Moreau, Luc Provenance network analytics is a novel data analytics approach that helps infer properties of data, such as quality or importance, from their provenance. Instead of analysing application data, which are typically domain-dependent, it analyses the data's provenance as represented using the World Wide Web Consortium's domain-agnostic PROV data model. Specifically, the approach proposes a number of network metrics for provenance data and applies established machine learning techniques over such metrics to build predictive models for some key properties of data. Applying this method to the provenance of real-world data from three different applications, we show that it can successfully identify the owners of provenance documents, assess the quality of crowdsourced data, and identify instructions from chat messages in an alternate-reality game with high levels of accuracy. By so doing, we demonstrate the different ways the proposed provenance network metrics can be used in analysing data, providing the foundation for provenance-based data analytics. Springer 2018-02-15 Article PeerReviewed Huynh, Trung Dong, Ebden, Mark, Fischer, Joel E., Roberts, Stephen and Moreau, Luc (2018) Provenance network analytics: an approach to data analytics using data provenance. Data Mining and Knowledge Discovery . ISSN 1573-756X data provenance; data analytics; network metrics; graph classification https://link.springer.com/article/10.1007/s10618-017-0549-3 doi:10.1007/s10618-017-0549-3 doi:10.1007/s10618-017-0549-3
spellingShingle data provenance; data analytics; network metrics; graph classification
Huynh, Trung Dong
Ebden, Mark
Fischer, Joel E.
Roberts, Stephen
Moreau, Luc
Provenance network analytics: an approach to data analytics using data provenance
title Provenance network analytics: an approach to data analytics using data provenance
title_full Provenance network analytics: an approach to data analytics using data provenance
title_fullStr Provenance network analytics: an approach to data analytics using data provenance
title_full_unstemmed Provenance network analytics: an approach to data analytics using data provenance
title_short Provenance network analytics: an approach to data analytics using data provenance
title_sort provenance network analytics: an approach to data analytics using data provenance
topic data provenance; data analytics; network metrics; graph classification
url https://eprints.nottingham.ac.uk/48901/
https://eprints.nottingham.ac.uk/48901/
https://eprints.nottingham.ac.uk/48901/