Use of graph theory measures to identify errors in record linkage

Ensuring high linkage quality is important in many record linkage applications. Current methods for ensuring quality are manual and resource intensive. This paper seeks to determine the effectiveness of graph theory techniques in identifying record linkage errors. A range of graph theory techniques...

Full description

Bibliographic Details
Main Authors: Randall, Sean, Boyd, James, Ferrante, Anna, Bauer, J., Semmens, James
Format: Journal Article
Published: Elsevier Ireland Ltd 2014
Subjects:
Online Access:http://hdl.handle.net/20.500.11937/3205
_version_ 1848744167752073216
author Randall, Sean
Boyd, James
Ferrante, Anna
Bauer, J.
Semmens, James
author_facet Randall, Sean
Boyd, James
Ferrante, Anna
Bauer, J.
Semmens, James
author_sort Randall, Sean
building Curtin Institutional Repository
collection Online Access
description Ensuring high linkage quality is important in many record linkage applications. Current methods for ensuring quality are manual and resource intensive. This paper seeks to determine the effectiveness of graph theory techniques in identifying record linkage errors. A range of graph theory techniques was applied to two linked datasets, with known truth sets. The ability of graph theory techniques to identify groups containing errors was compared to a widely used threshold setting technique. This methodology shows promise; however, further investigations into graph theory techniques are required. The development of more efficient and effective methods of improving linkage quality will result in higher quality datasets that can be delivered to researchers in shorter timeframes.
first_indexed 2025-11-14T05:57:10Z
format Journal Article
id curtin-20.500.11937-3205
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T05:57:10Z
publishDate 2014
publisher Elsevier Ireland Ltd
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-32052019-02-19T04:26:00Z Use of graph theory measures to identify errors in record linkage Randall, Sean Boyd, James Ferrante, Anna Bauer, J. Semmens, James Record linkage Graph theory Data quality Ensuring high linkage quality is important in many record linkage applications. Current methods for ensuring quality are manual and resource intensive. This paper seeks to determine the effectiveness of graph theory techniques in identifying record linkage errors. A range of graph theory techniques was applied to two linked datasets, with known truth sets. The ability of graph theory techniques to identify groups containing errors was compared to a widely used threshold setting technique. This methodology shows promise; however, further investigations into graph theory techniques are required. The development of more efficient and effective methods of improving linkage quality will result in higher quality datasets that can be delivered to researchers in shorter timeframes. 2014 Journal Article http://hdl.handle.net/20.500.11937/3205 10.1016/j.cmpb.2014.03.008 Elsevier Ireland Ltd fulltext
spellingShingle Record linkage
Graph theory
Data quality
Randall, Sean
Boyd, James
Ferrante, Anna
Bauer, J.
Semmens, James
Use of graph theory measures to identify errors in record linkage
title Use of graph theory measures to identify errors in record linkage
title_full Use of graph theory measures to identify errors in record linkage
title_fullStr Use of graph theory measures to identify errors in record linkage
title_full_unstemmed Use of graph theory measures to identify errors in record linkage
title_short Use of graph theory measures to identify errors in record linkage
title_sort use of graph theory measures to identify errors in record linkage
topic Record linkage
Graph theory
Data quality
url http://hdl.handle.net/20.500.11937/3205