Randomized Response and Balanced Bloom Filters for Privacy Preserving Record Linkage

© 2016 IEEE. In most European settings, record linkage across different institutions is based on encrypted personal identifiers-such as names, birthdays, or places of birth-To protect privacy. However, in practice up to 20% of the records may contain errors in identifiers. Thus, exact record linkage...

Full description

Bibliographic Details
Main Authors: Schnell, Rainer, Borgs, Christian
Format: Conference Paper
Published: 2017
Online Access:http://hdl.handle.net/20.500.11937/71413
_version_ 1848762473626206208
author Schnell, Rainer
Borgs, Christian
author_facet Schnell, Rainer
Borgs, Christian
author_sort Schnell, Rainer
building Curtin Institutional Repository
collection Online Access
description © 2016 IEEE. In most European settings, record linkage across different institutions is based on encrypted personal identifiers-such as names, birthdays, or places of birth-To protect privacy. However, in practice up to 20% of the records may contain errors in identifiers. Thus, exact record linkage on encrypted identifiers usually results in the loss of large subsets of the data. Such losses usually imply biased statistical estimates since the causes of errors might be correlated with the variables of interest in many applications. Over the past 10 years, the field of Privacy Preserving Record Linkage (PPRL) has developed different techniques to link data without revealing the identity of the described entity. However, only few techniques are suitable for applied research with large data bases that include millions of records, which is typical for administrative or medical data bases. Bloom filters were found to be one successful technique for PPRL when large scale applications are concerned. Yet, Bloom filters have been subject to cryptographic attacks. Previous research has shown that the straight application of Bloom filters has a non-zero re-identification risk. We present new results on recently developed techniques defying all known attacks on PPRL Bloom filters. The computationally inexpensive algorithms modify personal identifiers by combining different cryptographic techniques. The paper demonstrates these new algorithms and demonstrates their performance concerning precision, recall, and re-identification risk on large data bases.
first_indexed 2025-11-14T10:48:08Z
format Conference Paper
id curtin-20.500.11937-71413
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T10:48:08Z
publishDate 2017
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-714132018-12-13T09:32:07Z Randomized Response and Balanced Bloom Filters for Privacy Preserving Record Linkage Schnell, Rainer Borgs, Christian © 2016 IEEE. In most European settings, record linkage across different institutions is based on encrypted personal identifiers-such as names, birthdays, or places of birth-To protect privacy. However, in practice up to 20% of the records may contain errors in identifiers. Thus, exact record linkage on encrypted identifiers usually results in the loss of large subsets of the data. Such losses usually imply biased statistical estimates since the causes of errors might be correlated with the variables of interest in many applications. Over the past 10 years, the field of Privacy Preserving Record Linkage (PPRL) has developed different techniques to link data without revealing the identity of the described entity. However, only few techniques are suitable for applied research with large data bases that include millions of records, which is typical for administrative or medical data bases. Bloom filters were found to be one successful technique for PPRL when large scale applications are concerned. Yet, Bloom filters have been subject to cryptographic attacks. Previous research has shown that the straight application of Bloom filters has a non-zero re-identification risk. We present new results on recently developed techniques defying all known attacks on PPRL Bloom filters. The computationally inexpensive algorithms modify personal identifiers by combining different cryptographic techniques. The paper demonstrates these new algorithms and demonstrates their performance concerning precision, recall, and re-identification risk on large data bases. 2017 Conference Paper http://hdl.handle.net/20.500.11937/71413 10.1109/ICDMW.2016.0038 restricted
spellingShingle Schnell, Rainer
Borgs, Christian
Randomized Response and Balanced Bloom Filters for Privacy Preserving Record Linkage
title Randomized Response and Balanced Bloom Filters for Privacy Preserving Record Linkage
title_full Randomized Response and Balanced Bloom Filters for Privacy Preserving Record Linkage
title_fullStr Randomized Response and Balanced Bloom Filters for Privacy Preserving Record Linkage
title_full_unstemmed Randomized Response and Balanced Bloom Filters for Privacy Preserving Record Linkage
title_short Randomized Response and Balanced Bloom Filters for Privacy Preserving Record Linkage
title_sort randomized response and balanced bloom filters for privacy preserving record linkage
url http://hdl.handle.net/20.500.11937/71413