Balancing Disclosure Risk with Inferential Power: Software for Intervalized Data

平衡披露风险与推理能力:间隔数据软件

基本信息

  • 批准号:
    8517848
  • 负责人:
  • 金额:
    $ 23.61万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2012
  • 资助国家:
    美国
  • 起止时间:
    2012-08-01 至 2014-05-31
  • 项目状态:
    已结题

项目摘要

DESCRIPTION (provided by applicant): Patient data collected during health care delivery and public health surveys possess a great deal of information that could be used in biomedical and epidemiological research. Access to these data, however, is usually limited because of the private nature of most personal health records. Methods of balancing the informativeness of data for research with the information loss required to minimize disclosure risk are needed before these data can be used to improve public health. Current methods are primarily focused on protecting privacy, but focusing on protecting privacy alone is inadequate. In statistical disclosure control techniques, information truthfulness is not well preserved so that unreliable results may be released. In generalization-based anonymization approaches, there is information loss due to attribute generalization and existing techniques do not provide sufficient control for maintaining data utility. What are currently needed are methods that protect both the privacy of individuals represented in the data as well as the integrity of relationships studied by researchers. The problem is that there is an inherent tradeoff between protecting the privacy of individuals and protecting the informativeness of the data set. Protecting the privacy of individuals always results in a loss of information and it is the information contained by the data set that affects the power of a statistical test. For a given anonymization strategy, however, there are often multiple ways of masking the data that meet the disclosure risk criteria provided. This can be taken advantage of to choose the solution that best preserves statistical information while meeting the disclosure risk criteria provided. This project will develop the first integrated software system that provides solutions for problems faced in all three stages in the release of sensitive health care data: 1. anonymize a data set by intervalizing/generalizing data to satisfy currently available anonymization strategies, 2. provide sufficient controls within anonymization procedures to satisfy constraints on statistical usefulness of the data, and 3. compute statistical tests for the anonymized data intervals. There are two main challenges facing this effort. The first is that, based on existing research results, integrating our proposed new control processes into anonymization procedures is expected to be computationally difficult. We will overcome this challenge by developing efficient and practically useful greedy algorithms, approximation algorithms, or algorithms working for realistic situations (if not for general cases). The other primary challenge facing this effort is the fact that statistical calculations with interval data sets are known to be computationally difficult, and these calculations are necessary both for control processes within anonymization procedures and for subsequent statistical computation and tests. We will overcome this challenge with efficient algorithms that exploit the structure present in data sets intervalized for privacy. The software will be tested on medical data sets of various sizes and structures to demonstrate the feasibility of the approach and to characterize the scalability of the algorithms with data set size.
描述(由申请人提供):在提供医疗保健和公共卫生调查期间收集的患者数据拥有大量可用于生物医学和流行病学研究的信息。然而,由于大多数个人健康记录的私密性,对这些数据的访问通常是有限的。在这些数据被用于改善公共健康之前,需要平衡用于研究的数据的信息性与将披露风险降至最低所需的信息损失的方法。目前的方法主要集中在保护隐私上,但仅专注于保护隐私是不够的。在统计披露控制技术中,信息的真实性没有得到很好的保存,因此可能会发布不可靠的结果。在基于泛化的匿名化方法中,由于属性泛化而导致信息丢失,并且现有技术不能提供足够的控制来维护数据效用。目前需要的是既保护数据中代表的个人隐私又保护研究人员研究的关系完整性的方法。问题是,在保护个人隐私和保护数据集的信息性之间存在内在的权衡。保护个人隐私总是会导致信息的丢失,而正是数据集所包含的信息影响了统计测试的效力。然而,对于给定的匿名化策略,通常有多种方法来屏蔽满足所提供的披露风险标准的数据。可以利用这一点来选择在满足所提供的披露风险标准的同时最好地保存统计信息的解决方案。该项目将开发第一个综合软件系统,为发布敏感保健数据的所有三个阶段面临的问题提供解决办法:1.通过间隔化/通用化数据来匿名化数据集,以满足目前可用的匿名化战略;2.在匿名化程序内提供足够的控制,以满足对数据统计有用性的限制;以及3.计算匿名化数据间隔的统计检验。这一努力面临两个主要挑战。首先,根据现有的研究结果,将我们提出的新控制过程集成到匿名化过程中预计在计算上是困难的。我们将通过开发高效且实用的贪婪算法、近似算法或适用于现实情况(如果不适用于一般情况)的算法来克服这一挑战。这项工作面临的另一个主要挑战是,众所周知,区间数据集的统计计算在计算上是困难的,这些计算对于匿名化程序中的控制过程以及后续的统计计算和测试都是必要的。我们将用高效的算法来克服这一挑战,这些算法利用数据集中存在的结构来保护隐私。该软件将在不同大小和结构的医疗数据集上进行测试,以验证该方法的可行性,并用数据集大小来表征算法的可扩展性。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Data Anonymization that Leads to the Most Accurate Estimates of Statistical Characteristics: Fuzzy-Motivated Approach.
数据匿名化可实现最准确的统计特征估计:模糊驱动方法。
  • DOI:
    10.1109/ifsa-nafips.2013.6608471
  • 发表时间:
    2013
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Xiang,G;Ferson,S;Ginzburg,L;Longpré,L;Mayorga,E;Kosheleva,O
  • 通讯作者:
    Kosheleva,O
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

SCOTT D FERSON其他文献

SCOTT D FERSON的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('SCOTT D FERSON', 18)}}的其他基金

Balancing Disclosure Risk with Inferential Power: Software for Intervalized Data
平衡披露风险与推理能力:间隔数据软件
  • 批准号:
    8251091
  • 财政年份:
    2012
  • 资助金额:
    $ 23.61万
  • 项目类别:
Compensating for Uncertainty Biases in Health Risk Judgments
补偿健康风险判断中的不确定性偏差
  • 批准号:
    7926647
  • 财政年份:
    2010
  • 资助金额:
    $ 23.61万
  • 项目类别:
Safe environmental concentrations under uncertainty
不确定条件下的安全环境浓度
  • 批准号:
    6337570
  • 财政年份:
    2001
  • 资助金额:
    $ 23.61万
  • 项目类别:
Safe environment concentrations under uncertainty
不确定性下的安全环境浓度
  • 批准号:
    6788050
  • 财政年份:
    2000
  • 资助金额:
    $ 23.61万
  • 项目类别:
Safe environment concentrations under uncertainty
不确定性下的安全环境浓度
  • 批准号:
    6645820
  • 财政年份:
    2000
  • 资助金额:
    $ 23.61万
  • 项目类别:
QUALITY ASSURANCE FOR ENVIRONMENTAL RISKS
环境风险的质量保证
  • 批准号:
    2896800
  • 财政年份:
    1996
  • 资助金额:
    $ 23.61万
  • 项目类别:
QUALITY ASSURANCE FOR ENVIRONMENTAL RISKS
环境风险的质量保证
  • 批准号:
    2864025
  • 财政年份:
    1996
  • 资助金额:
    $ 23.61万
  • 项目类别:
CONSERVATIVE RISK ANALYSIS USING DEPENDENCY BOUNDS
使用依赖性界限的保守风险分析
  • 批准号:
    2018482
  • 财政年份:
    1996
  • 资助金额:
    $ 23.61万
  • 项目类别:
DETECTING DISEASE CLUSTERS IN STRUCTURED ENVIRONMENTS
检测结构化环境中的疾病群
  • 批准号:
    2187080
  • 财政年份:
    1993
  • 资助金额:
    $ 23.61万
  • 项目类别:
DETECTING DISEASE CLUSTERS IN STRUCTURED ENVIRONMENTS
检测结构化环境中的疾病群
  • 批准号:
    2430476
  • 财政年份:
    1993
  • 资助金额:
    $ 23.61万
  • 项目类别:

相似海外基金

CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
  • 批准号:
    2337776
  • 财政年份:
    2024
  • 资助金额:
    $ 23.61万
  • 项目类别:
    Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
  • 批准号:
    2338816
  • 财政年份:
    2024
  • 资助金额:
    $ 23.61万
  • 项目类别:
    Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
  • 批准号:
    2338846
  • 财政年份:
    2024
  • 资助金额:
    $ 23.61万
  • 项目类别:
    Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
  • 批准号:
    2348261
  • 财政年份:
    2024
  • 资助金额:
    $ 23.61万
  • 项目类别:
    Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
  • 批准号:
    2348346
  • 财政年份:
    2024
  • 资助金额:
    $ 23.61万
  • 项目类别:
    Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
  • 批准号:
    2348457
  • 财政年份:
    2024
  • 资助金额:
    $ 23.61万
  • 项目类别:
    Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
  • 批准号:
    2404989
  • 财政年份:
    2024
  • 资助金额:
    $ 23.61万
  • 项目类别:
    Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
  • 批准号:
    2339310
  • 财政年份:
    2024
  • 资助金额:
    $ 23.61万
  • 项目类别:
    Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
  • 批准号:
    2339669
  • 财政年份:
    2024
  • 资助金额:
    $ 23.61万
  • 项目类别:
    Continuing Grant
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
  • 批准号:
    EP/Y029089/1
  • 财政年份:
    2024
  • 资助金额:
    $ 23.61万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了