Collaborative Research: Record Linkage and Privacy-Preserving Methods for Big Data

协作研究:大数据的记录链接和隐私保护方法

基本信息

项目摘要

This research project will develop sound statistical and machine learning techniques for preserving privacy with linked data. Social entities and their patterns of behavior is a crucial topic in the social sciences. Research in this area has been invigorated by the growth of the modern information infrastructure, ease of data collection and storage, and the development of novel computational data analyses techniques. However, in many application areas relevant and sensitive information is commonly located across multiple databases. Data analysis is inherently impossible without merging databases, but at the cost of increasing the risk of a privacy violation. This research will address the problem of how to perform valid statistical inference in the presence of multiple data sources, data sharing, and privacy in the age of "big data." The investigators' new modeling construct for inference and uncertainty quantification will contribute to both statistics and the many disciplines for which statistics is a principal tool. The methods will have a wide range of applications in the social, economic, and behavioral sciences, including medicine, genetics, official statistics, and human rights violations. The investigators will collaborate with post-doctoral researcher and with graduate and undergraduate students. The statistical methods will be encapsulated in open-source software packages, allowing off-the-shelf use by practitioners while facilitating more detailed control and extensions.This interdisciplinary research project will improve upon methods in record linkage and privacy using state-of-the-art techniques from statistics and machine learning. Record linkage is the process of merging possible noisy databases with the goal of removing duplicate entries. Privacy-preserving record linkage (PPRL) tries to identify records that refer to the same entities from multiple databases without compromising the privacy of the entities represented by these records. The research will focus on three aims: (1) development of new Bayesian methods for PPRL, where the error can be propagated exactly across the entire linkage process and into statistical inference, including new privacy measures to capture a tradeoff between utility and risk of any individual risk in a linked database; (2) development of new robust methods for realizing synthetic data releases post-linkage with differential privacy guarantees and its relaxations to address additional layers of privacy and support broader data sharing; and (3) exploration of "big data" methods such as variational inference to address scalability and latent cluster exchangeability issues existing within linkage and privacy, such that the new methods can scale to multiple and large databases. The new methods will be scalable and assess uncertainty throughout the entire linkage and privacy process and can be evaluated using Bayesian disclosure risk and Bayesian differential privacy. The project is supported by the Methodology, Measurement, and Statistics Program and a consortium of federal statistical agencies as part of a joint activity to support research on survey and statistical methodology.
该研究项目将开发可靠的统计和机器学习技术,以保护关联数据的隐私。社会实体及其行为模式是社会科学中的一个重要课题。现代信息基础设施的发展、数据收集和存储的便利以及新型计算数据分析技术的发展,使这一领域的研究活跃起来。然而,在许多应用领域中,相关和敏感信息通常位于多个数据库中。如果不合并数据库,数据分析本质上是不可能的,但代价是增加侵犯隐私的风险。本研究将解决在“大数据”时代,如何在多个数据源、数据共享和隐私的存在下进行有效的统计推断的问题。研究者对推理和不确定性量化的新建模结构将有助于统计和以统计为主要工具的许多学科。这些方法将广泛应用于社会、经济和行为科学,包括医学、遗传学、官方统计和侵犯人权。研究人员将与博士后研究员、研究生和本科生合作。统计方法将被封装在开源软件包中,允许从业者使用现成的方法,同时促进更详细的控制和扩展。这个跨学科的研究项目将使用最先进的统计学和机器学习技术来改进记录链接和隐私的方法。记录链接是合并可能存在噪声的数据库的过程,目的是删除重复条目。隐私保护记录链接(PPRL)试图识别引用多个数据库中相同实体的记录,而不损害这些记录所代表实体的隐私。研究将集中在三个目标上:(1)为PPRL开发新的贝叶斯方法,其中错误可以在整个链接过程中准确传播并进入统计推断,包括新的隐私措施,以捕获链接数据库中任何个体风险的效用和风险之间的权衡;(2)开发新的鲁棒方法,实现具有差异隐私保障的合成数据发布后链接及其放松,以解决额外的隐私层并支持更广泛的数据共享;(3)探索“大数据”方法,如变分推理,以解决链接和隐私中存在的可扩展性和潜在的集群可交换性问题,从而使新方法可以扩展到多个大型数据库。新方法将具有可扩展性,并评估整个链接和隐私过程中的不确定性,并且可以使用贝叶斯披露风险和贝叶斯差分隐私进行评估。该项目由方法、测量和统计项目和联邦统计机构联盟支持,作为支持调查和统计方法研究的联合活动的一部分。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
A Latent Class Modeling Approach for Differentially Private Synthetic Data for Contingency Tables
列联表差分私有合成数据的潜在类建模方法
  • DOI:
    10.29012/jpc.768
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Nixon, Michelle;Barrientos, Andres;Reiter, Jerome;Slavkovic, Aleksandra
  • 通讯作者:
    Slavkovic, Aleksandra
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Aleksandra Slavkovic其他文献

Aleksandra Slavkovic的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Aleksandra Slavkovic', 18)}}的其他基金

Formal Privacy for Complex Data Objects
复杂数据对象的正式隐私
  • 批准号:
    1853209
  • 财政年份:
    2019
  • 资助金额:
    $ 33.42万
  • 项目类别:
    Standard Grant
CDI-Type II: Collaborative Research: Integrating Statistical and Computational Approaches to Privacy
CDI-类型 II:协作研究:整合隐私统计和计算方法
  • 批准号:
    0941553
  • 财政年份:
    2010
  • 资助金额:
    $ 33.42万
  • 项目类别:
    Standard Grant
Statistical Disclosure Limitation Methods for Tabular Data
表格数据的统计披露限制方法
  • 批准号:
    0532407
  • 财政年份:
    2005
  • 资助金额:
    $ 33.42万
  • 项目类别:
    Standard Grant

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
Cell Research
  • 批准号:
    31224802
  • 批准年份:
    2012
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research
  • 批准号:
    31024804
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research (细胞研究)
  • 批准号:
    30824808
  • 批准年份:
    2008
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
  • 批准号:
    10774081
  • 批准年份:
    2007
  • 资助金额:
    45.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: Examining Pyrotechnology and Ecosystem Change in the Archaeological Record
合作研究:检查考古记录中的火工技术和生态系统变化
  • 批准号:
    2413996
  • 财政年份:
    2023
  • 资助金额:
    $ 33.42万
  • 项目类别:
    Standard Grant
Collaborative Research: RUI: An undergraduate cohort thermochronology research and mentorship experience investigating the thermo-tectonic record of the northern Klamath Mountains
合作研究:RUI:本科生群体热年代学研究和指导经验,调查克拉马斯山脉北部的热构造记录
  • 批准号:
    2242862
  • 财政年份:
    2023
  • 资助金额:
    $ 33.42万
  • 项目类别:
    Standard Grant
Collaborative Research: RUI: An undergraduate cohort thermochronology research and mentorship experience investigating the thermo-tectonic record of the northern Klamath Mountains
合作研究:RUI:本科生群体热年代学研究和指导经验,调查克拉马斯山脉北部的热构造记录
  • 批准号:
    2242861
  • 财政年份:
    2023
  • 资助金额:
    $ 33.42万
  • 项目类别:
    Standard Grant
Collaborative Research: A 50,000-year continuous record of the Indian Summer Monsoon from Loktak Lake, NE India
合作研究:印度东北部洛克塔克湖 50,000 年连续记录的印度夏季季风
  • 批准号:
    2303253
  • 财政年份:
    2023
  • 资助金额:
    $ 33.42万
  • 项目类别:
    Standard Grant
Collaborative Research: A 50,000-year continuous record of the Indian Summer Monsoon from Loktak Lake, NE India
合作研究:印度东北部洛克塔克湖 50,000 年连续记录的印度夏季季风
  • 批准号:
    2303255
  • 财政年份:
    2023
  • 资助金额:
    $ 33.42万
  • 项目类别:
    Standard Grant
Collaborative Research: A 50,000-year continuous record of the Indian Summer Monsoon from Loktak Lake, NE India
合作研究:印度东北部洛克塔克湖 50,000 年连续记录的印度夏季季风
  • 批准号:
    2303254
  • 财政年份:
    2023
  • 资助金额:
    $ 33.42万
  • 项目类别:
    Standard Grant
Collaborative Research: Reconstructing the missing record of late Proterozoic tectonism along the western margin of Laurentia using deep-time thermochronology
合作研究:利用深时热年代学重建劳伦大陆西缘晚元古代构造运动的缺失记录
  • 批准号:
    2140481
  • 财政年份:
    2022
  • 资助金额:
    $ 33.42万
  • 项目类别:
    Standard Grant
Collaborative Research: Do subduction‐complex metamorphic rocks record the thermal evolution of a subduction zone or periods of anomalous tectonic activity? Baja California
合作研究:俯冲复杂变质岩是否记录了俯冲带的热演化或异常构造活动的时期?
  • 批准号:
    2127229
  • 财政年份:
    2022
  • 资助金额:
    $ 33.42万
  • 项目类别:
    Standard Grant
Collaborative Research: Sustaining The Utqiagvik Aerosol Record of Decades (STUARD)
合作研究:维持乌特恰格维克气溶胶数十年记录 (STUARD)
  • 批准号:
    2127737
  • 财政年份:
    2022
  • 资助金额:
    $ 33.42万
  • 项目类别:
    Continuing Grant
Collaborative Research: Sustaining The Utqiaġvik Aerosol Record of Decades (STUARD)
合作研究:维持乌特恰维克气溶胶数十年来的记录 (STUARD)
  • 批准号:
    2127733
  • 财政年份:
    2022
  • 资助金额:
    $ 33.42万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了