Advanced Theory and Methods for Evaluating the Utility and Privacy Risks of Synthetic Health Data

评估综合健康数据的实用性和隐私风险的先进理论和方法

基本信息

  • 批准号:
    RGPIN-2022-04811
  • 负责人:
  • 金额:
    $ 1.75万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2022
  • 资助国家:
    加拿大
  • 起止时间:
    2022-01-01 至 2023-12-31
  • 项目状态:
    已结题

项目摘要

Access to health data for secondary purposes remains a challenge because of privacy concerns. Synthetic data generation (SDG) has been proposed to enable data sharing that is believed to have low identification risks because there is no one-to-one mapping to real individuals. However, if the generative models used to generate synthetic data are overfit, or if a dataset is categorical with a small number of possible combinations of values, then real records may be generated. The adoption of SDG will also depend on demonstrating the utility of the generated data. Utility is broadly defined as the ability to replicate the conclusions from the analysis of real data on synthetic data. SDG needs to simultaneously optimize on privacy and utility. However, thus far SDG loss functions have largely been focused on maximizing utility, and privacy risks are often assessed after the data are generated. The purpose of this program is to develop a unified privacy framework for SDG, and to evaluate and improve current utility metrics. These results would then be used to define and test a combined loss metric that can be applied to optimize the generation of synthetic data which allows for the simultaneous management of privacy and utility. Privacy Evaluation Our focus in this program will be on identity disclosure conditional on attribute disclosure and membership disclosure. We will develop and validate a unified risk model that integrates identity, attribute, and membership disclosure. Currently there are no privacy models that are directly applicable to longitudinal synthetic datasets. The unified model of disclosure above will be extended to longitudinal data with multiple heterogeneous events per patient. Existing approaches used in the disclosure control literature will be incorporated into the synthetic data privacy model. Utility Evaluation Utility metrics can serve multiple purposes such as model optimization and synthetic dataset evaluation to accept or reject specific generated datasets. In this part of the program, current utility metrics will be empirically evaluated. The results will clarify which utility metrics are useful for optimization, and synthesized dataset acceptance/rejection. Currently, there has been a dearth of work on evaluating the utility of synthetic longitudinal data. Simple approaches such as concordance between k-order Markov chains capture some structural properties, but do not provide measures related to analytic workloads. This program of research will extend and evaluate the utility metrics for longitudinal data. Risk-Utility Optimization With appropriately defined privacy and utility metrics, a combined risk-utility measure can be defined and used as an optimization criterion for SDG algorithms. This will ensure that generated synthetic data satisfy both criteria by construction. Such a measure will be evaluated on common SDG algorithms used on health data.
出于隐私方面的考虑,为次要目的获取卫生数据仍然是一个挑战。合成数据生成(SDG)被提出用于实现数据共享,这种共享被认为具有较低的识别风险,因为没有与真实个体的一对一映射。但是,如果用于生成合成数据的生成模型是过拟合的,或者如果数据集具有少量可能的值组合的分类,则可能生成真实记录。可持续发展目标的采用还将取决于所生成数据的实用性。效用的广义定义是将真实数据分析得出的结论复制到合成数据上的能力。可持续发展目标需要同时优化隐私和效用。然而,到目前为止,可持续发展目标损失函数主要集中在效用最大化上,隐私风险通常是在数据生成后评估的。该计划的目的是为可持续发展目标制定统一的隐私框架,并评估和改进当前的效用指标。然后,这些结果将用于定义和测试一个综合损失度量,该度量可用于优化合成数据的生成,从而允许同时管理隐私和效用。隐私评估我们在这个项目中的重点是身份披露,前提是属性披露和成员披露。我们将开发并验证集成身份、属性和成员披露的统一风险模型。目前还没有直接适用于纵向合成数据集的隐私模型。上述公开的统一模型将扩展到每个患者具有多个异构事件的纵向数据。在披露控制文献中使用的现有方法将被纳入综合数据隐私模型。效用评估效用指标可以用于多种目的,例如模型优化和综合数据集评估,以接受或拒绝特定生成的数据集。在程序的这一部分中,将对当前的效用度量进行经验评估。结果将澄清哪些效用指标对优化有用,以及合成数据集的接受/拒绝。目前,在评价综合纵向数据的效用方面缺乏工作。简单的方法,如k阶马尔可夫链之间的一致性,可以捕获一些结构属性,但不提供与分析工作负载相关的度量。这个研究项目将扩展和评估纵向数据的效用度量。通过适当定义隐私和效用指标,可以定义组合的风险效用度量,并将其用作可持续发展目标算法的优化标准。这将确保生成的合成数据通过构造满足这两个标准。将根据用于卫生数据的共同可持续发展目标算法对这一措施进行评估。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

ElEmam, Khaled其他文献

ElEmam, Khaled的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('ElEmam, Khaled', 18)}}的其他基金

Advanced theory and methods for the de-identification of small cohorts, complex and composed health data
小群体、复杂组合健康数据去识别化的先进理论和方法
  • 批准号:
    RGPIN-2016-06781
  • 财政年份:
    2021
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Advanced theory and methods for the de-identification of small cohorts, complex and composed health data
小群体、复杂组合健康数据去识别化的先进理论和方法
  • 批准号:
    RGPIN-2016-06781
  • 财政年份:
    2020
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Advanced theory and methods for the de-identification of small cohorts, complex and composed health data
小群体、复杂组合健康数据去识别化的先进理论和方法
  • 批准号:
    RGPIN-2016-06781
  • 财政年份:
    2019
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Advanced theory and methods for the de-identification of small cohorts, complex and composed health data
小群体、复杂组合健康数据去识别化的先进理论和方法
  • 批准号:
    RGPIN-2016-06781
  • 财政年份:
    2018
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Advanced theory and methods for the de-identification of small cohorts, complex and composed health data
小群体、复杂组合健康数据去识别化的先进理论和方法
  • 批准号:
    RGPIN-2016-06781
  • 财政年份:
    2017
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Advanced theory and methods for the de-identification of small cohorts, complex and composed health data
小群体、复杂组合健康数据去识别化的先进理论和方法
  • 批准号:
    RGPIN-2016-06781
  • 财政年份:
    2016
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Metrics and methods for the de-identification of health information
健康信息去识别化的指标和方法
  • 批准号:
    186936-2011
  • 财政年份:
    2015
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Metrics and methods for the de-identification of health information
健康信息去识别化的指标和方法
  • 批准号:
    186936-2011
  • 财政年份:
    2014
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Electronic Health Information
电子健康信息
  • 批准号:
    1000216983-2009
  • 财政年份:
    2014
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Canada Research Chairs
Electronic Health Information
电子健康信息
  • 批准号:
    1000216983-2009
  • 财政年份:
    2013
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Canada Research Chairs

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
基于isomorph theory研究尘埃等离子体物理量的微观动力学机制
  • 批准号:
    12247163
  • 批准年份:
    2022
  • 资助金额:
    18.00 万元
  • 项目类别:
    专项项目
Toward a general theory of intermittent aeolian and fluvial nonsuspended sediment transport
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    55 万元
  • 项目类别:
英文专著《FRACTIONAL INTEGRALS AND DERIVATIVES: Theory and Applications》的翻译
  • 批准号:
    12126512
  • 批准年份:
    2021
  • 资助金额:
    12.0 万元
  • 项目类别:
    数学天元基金项目
基于Restriction-Centered Theory的自然语言模糊语义理论研究及应用
  • 批准号:
    61671064
  • 批准年份:
    2016
  • 资助金额:
    65.0 万元
  • 项目类别:
    面上项目

相似海外基金

NewDataMetrics: Econometrics for New Data: Theory, Methods, and Applications
NewDataMetrics:新数据的计量经济学:理论、方法和应用
  • 批准号:
    EP/Z000335/1
  • 财政年份:
    2024
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Research Grant
Combining Machine Learning Explanation Methods with Expectancy-Value Theory to Identify Tailored Interventions for Engineering Student Persistence
将机器学习解释方法与期望值理论相结合,确定针对工程学生坚持的定制干预措施
  • 批准号:
    2335725
  • 财政年份:
    2024
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Standard Grant
MCA: Towards a Theory of Engineering Identity Development & Persistence of Minoritized Students with Imposter Feelings: A Longitudinal Mixed-methods Study of Developmental Networks
MCA:迈向工程身份发展理论
  • 批准号:
    2421846
  • 财政年份:
    2024
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Standard Grant
CAREER: Statistical Inference in Observational Studies -- Theory, Methods, and Beyond
职业:观察研究中的统计推断——理论、方法及其他
  • 批准号:
    2338760
  • 财政年份:
    2024
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Continuing Grant
LEAPS-MPS: Applications of Algebraic and Topological Methods in Graph Theory Throughout the Sciences
LEAPS-MPS:代数和拓扑方法在图论中在整个科学领域的应用
  • 批准号:
    2313262
  • 财政年份:
    2023
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Standard Grant
Non-Perturbative Methods in Field Theory and Many-Body Physics
场论和多体物理中的非微扰方法
  • 批准号:
    2310283
  • 财政年份:
    2023
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Continuing Grant
Collaborative Research: Using Complex Systems Theory and Methods to Gauge the Gains and Persisting Challenges of Broadening Participation Initiatives
合作研究:利用复杂系统理论和方法来衡量扩大参与计划的收益和持续的挑战
  • 批准号:
    2301197
  • 财政年份:
    2023
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Standard Grant
Bayesian Learning for Spatial Point Processes: Theory, Methods, Computation, and Applications
空间点过程的贝叶斯学习:理论、方法、计算和应用
  • 批准号:
    2412923
  • 财政年份:
    2023
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Standard Grant
Development of Theoretical Design Methods of Catalysts Based on Electronic Structure Theory and Their Applications to Design and Development of High-Performance Molecular Catalysts
基于电子结构理论的催化剂理论设计方法发展及其在高性能分子催化剂设计与开发中的应用
  • 批准号:
    22KJ0003
  • 财政年份:
    2023
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Grant-in-Aid for JSPS Fellows
Collaborative Research: Randomized Feature Methods for Modeling and Dynamics: Theory and Algorithms
协作研究:建模和动力学的随机特征方法:理论和算法
  • 批准号:
    2331033
  • 财政年份:
    2023
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了