Supporting Regularized Logistic Regression Privately and Efficiently

As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulatio...

Full description

Bibliographic Details
Main Authors: Li, Wenfa, Liu, Hongzhe, Yang, Peng, Xie, Wei
Format: Online
Language:English
Published: Public Library of Science 2016
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4894560/
id pubmed-4894560
recordtype oai_dc
spelling pubmed-48945602016-06-23 Supporting Regularized Logistic Regression Privately and Efficiently Li, Wenfa Liu, Hongzhe Yang, Peng Xie, Wei Research Article As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc. Public Library of Science 2016-06-06 /pmc/articles/PMC4894560/ /pubmed/27271738 http://dx.doi.org/10.1371/journal.pone.0156479 Text en © 2016 Li et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
repository_type Open Access Journal
institution_category Foreign Institution
institution US National Center for Biotechnology Information
building NCBI PubMed
collection Online Access
language English
format Online
author Li, Wenfa
Liu, Hongzhe
Yang, Peng
Xie, Wei
spellingShingle Li, Wenfa
Liu, Hongzhe
Yang, Peng
Xie, Wei
Supporting Regularized Logistic Regression Privately and Efficiently
author_facet Li, Wenfa
Liu, Hongzhe
Yang, Peng
Xie, Wei
author_sort Li, Wenfa
title Supporting Regularized Logistic Regression Privately and Efficiently
title_short Supporting Regularized Logistic Regression Privately and Efficiently
title_full Supporting Regularized Logistic Regression Privately and Efficiently
title_fullStr Supporting Regularized Logistic Regression Privately and Efficiently
title_full_unstemmed Supporting Regularized Logistic Regression Privately and Efficiently
title_sort supporting regularized logistic regression privately and efficiently
description As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc.
publisher Public Library of Science
publishDate 2016
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4894560/
_version_ 1613589369996705792