Unsupervised record matching with noisy and incomplete data

We consider the problem of duplicate detection in noisy and incomplete data: given a large data set in which each record has multiple entries (attributes), detect which distinct records refer to the same real world entity. This task is complicated by noise (such as misspellings) and missing data, wh...

Full description

Bibliographic Details
Main Authors: van Gennip, Yves, Hunter, Blake, Ma, Anna, Moyer, Dan, de Vera, Ryan, Bertozzi, Andrea L.
Format: Article
Published: Springer 2018
Subjects:
Online Access:https://eprints.nottingham.ac.uk/51471/