Use of graph theory measures to identify errors in record linkage
Ensuring high linkage quality is important in many record linkage applications. Current methods for ensuring quality are manual and resource intensive. This paper seeks to determine the effectiveness of graph theory techniques in identifying record linkage errors. A range of graph theory techniques...
| Main Authors: | , , , , |
|---|---|
| Format: | Journal Article |
| Published: |
Elsevier Ireland Ltd
2014
|
| Subjects: | |
| Online Access: | http://hdl.handle.net/20.500.11937/3205 |
| _version_ | 1848744167752073216 |
|---|---|
| author | Randall, Sean Boyd, James Ferrante, Anna Bauer, J. Semmens, James |
| author_facet | Randall, Sean Boyd, James Ferrante, Anna Bauer, J. Semmens, James |
| author_sort | Randall, Sean |
| building | Curtin Institutional Repository |
| collection | Online Access |
| description | Ensuring high linkage quality is important in many record linkage applications. Current methods for ensuring quality are manual and resource intensive. This paper seeks to determine the effectiveness of graph theory techniques in identifying record linkage errors. A range of graph theory techniques was applied to two linked datasets, with known truth sets. The ability of graph theory techniques to identify groups containing errors was compared to a widely used threshold setting technique. This methodology shows promise; however, further investigations into graph theory techniques are required. The development of more efficient and effective methods of improving linkage quality will result in higher quality datasets that can be delivered to researchers in shorter timeframes. |
| first_indexed | 2025-11-14T05:57:10Z |
| format | Journal Article |
| id | curtin-20.500.11937-3205 |
| institution | Curtin University Malaysia |
| institution_category | Local University |
| last_indexed | 2025-11-14T05:57:10Z |
| publishDate | 2014 |
| publisher | Elsevier Ireland Ltd |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | curtin-20.500.11937-32052019-02-19T04:26:00Z Use of graph theory measures to identify errors in record linkage Randall, Sean Boyd, James Ferrante, Anna Bauer, J. Semmens, James Record linkage Graph theory Data quality Ensuring high linkage quality is important in many record linkage applications. Current methods for ensuring quality are manual and resource intensive. This paper seeks to determine the effectiveness of graph theory techniques in identifying record linkage errors. A range of graph theory techniques was applied to two linked datasets, with known truth sets. The ability of graph theory techniques to identify groups containing errors was compared to a widely used threshold setting technique. This methodology shows promise; however, further investigations into graph theory techniques are required. The development of more efficient and effective methods of improving linkage quality will result in higher quality datasets that can be delivered to researchers in shorter timeframes. 2014 Journal Article http://hdl.handle.net/20.500.11937/3205 10.1016/j.cmpb.2014.03.008 Elsevier Ireland Ltd fulltext |
| spellingShingle | Record linkage Graph theory Data quality Randall, Sean Boyd, James Ferrante, Anna Bauer, J. Semmens, James Use of graph theory measures to identify errors in record linkage |
| title | Use of graph theory measures to identify errors in record linkage |
| title_full | Use of graph theory measures to identify errors in record linkage |
| title_fullStr | Use of graph theory measures to identify errors in record linkage |
| title_full_unstemmed | Use of graph theory measures to identify errors in record linkage |
| title_short | Use of graph theory measures to identify errors in record linkage |
| title_sort | use of graph theory measures to identify errors in record linkage |
| topic | Record linkage Graph theory Data quality |
| url | http://hdl.handle.net/20.500.11937/3205 |