Sharing Confidential Datasets With Geographic Identifiers Via Multiple Imputation

通过多重插补与地理标识符共享机密数据集

基本信息

  • 批准号:
    7774323
  • 负责人:
  • 金额:
    $ 19万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2009
  • 资助国家:
    美国
  • 起止时间:
    2009-03-01 至 2012-01-31
  • 项目状态:
    已结题

项目摘要

DESCRIPTION (provided by applicant): Geographic data can be enormously beneficial for analyses. In studies of aging, for example, they can reveal areas where elderly people live in high densities; they can illuminate how environmental factors impact the health and quality of life of elderly people; and, through contextual data, they can yield insights into the social and economic conditions and lifestyle choices of the elderly. However, geographic variables are among the most challenging data to share when making a primary data source available to others. Fine geography enables ill-intentioned users to pinpoint the identities of individuals in the shared file. Thus, data collectors typically delete or aggregate geographies to very high levels before sharing data. As examples, both deletion and aggregation are employed on geography in the public use files of the Health and Retirement Study; and, the Health Insurance Portability and Accountability Act requires that any geographic units on shared files comprise at least 20,000 people. These actions reduce the quality of analyses based on finer geographic detail, thereby sacrificing the benefits of using geography in analysis. We develop new methods to protect confidentiality in data with geographic identifiers. Our approach is to simulate values of geography and other identifying attributes, such as age, from statistical models that capture the spatial dependencies in the collected data. These simulated values replace the collected ones when sharing data. Partially simulated datasets can preserve confidentiality, since identification of units and their sensitive data is difficult when the geographies and other quasi-identifiers in the released data are not collected values. And, when the simulation models faithfully reflect the relationships in the collected data, the shared data preserve spatial associations, avoid ecological inference problems, and provide details about the tails of distributions. We have three specific aims in this proposal. First, using techniques from spatial modeling, we develop methods for simulating geographic variables conditional on attributes and for simulating at- tributes conditional on geography. Second, we apply our approach on a genuine dataset to evaluate the confidentiality protection and analytic utility of partially simulated data under three scenarios: only geography simulated, only non-geographic identifiers simulated, and both geographic and other identifiers simulated. Third, we compare our approach against aggregation techniques on the genuine dataset. Our long term goal is to develop general-purpose methodology and publicly available software for sharing inference-valid, safe data that includes finer details about geography than are currently released. This will provide statistical agencies, researchers, and other data producers with more and better options for data sharing than exist at present. PUBLIC HEALTH RELEVANCE: This research has the potential to improve the way statistical agencies, research centers, individual researchers, and other data producers share data on aging, and more broadly any health or de- mographic data containing geography. Unlike existing approaches such as deletion and high level aggregation, our approach promises to preserve fine geography and spatial relationships while pro- tecting confidentiality. Ultimately, this enables secondary data analysts to make more and better inferences, leading to deeper understanding of public health.
描述(由申请人提供):地理数据对分析非常有益。例如,在老龄化研究中,它们可以揭示老年人居住密度高的地区;它们可以阐明环境因素如何影响老年人的健康和生活质量;并且,通过背景数据,它们可以深入了解老年人的社会和经济条件以及生活方式选择。然而,在向他人提供主要数据源时,地理变量是最具挑战性的数据之一。精细的地理位置使恶意用户能够精确定位共享文件中的个人身份。因此,数据收集器通常在共享数据之前删除或聚合地理位置到非常高的级别。例如,在健康和退休研究的公共使用文件中,删除和聚合都是在地理上使用的;并且,健康保险便携性和责任法案要求共享文件中的任何地理单元至少包括20,000人。这些行动降低了基于更精细地理细节的分析质量,从而牺牲了在分析中使用地理的好处。我们开发新的方法来保护具有地理标识符的数据的机密性。我们的方法是从统计模型中模拟地理和其他识别属性(如年龄)的值,这些模型捕获所收集数据中的空间依赖性。在共享数据时,这些模拟值将取代收集的值。部分模拟的数据集可以保持机密性,因为当发布的数据中的地理和其他准标识符不是收集值时,很难识别单元及其敏感数据。而且,当模拟模型忠实地反映了收集到的数据中的关系时,共享数据保留了空间关联,避免了生态推理问题,并提供了有关分布尾部的细节。我们在这项建议中有三个具体目标。首先,使用空间建模技术,我们开发了模拟地理变量的条件下的属性和模拟属性的地理条件的方法。其次,我们将我们的方法应用于真实的数据集,以评估部分模拟数据在三种情况下的机密性保护和分析实用性:仅模拟地理,仅模拟非地理标识符,以及模拟地理和其他标识符。第三,我们将我们的方法与真实数据集上的聚合技术进行比较。我们的长期目标是开发通用方法和公开可用的软件,用于共享推理有效的安全数据,其中包括比目前发布的更详细的地理信息。这将为统计机构、研究人员和其他数据生产者提供比目前更多更好的数据共享选择。公共卫生相关性:这项研究有可能改善统计机构、研究中心、个人研究人员和其他数据生产者共享老龄化数据的方式,以及更广泛地说,任何包含地理信息的健康或人口统计数据。与现有的方法,如删除和高层次的聚合,我们的方法承诺,以保持良好的地理和空间关系,同时保护机密性。最终,这使二级数据分析师能够做出更多更好的推断,从而更深入地了解公共卫生。

项目成果

期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Estimating Identification Disclosure Risk Using Mixed Membership Models.
Multiple-Shrinkage Multinomial Probit Models with Applications to Simulating Geographies in Public Use Data.
多次收缩多项式概率模型及其在公共使用数据中模拟地理的应用。
  • DOI:
    10.1214/13-ba816
  • 发表时间:
    2013
  • 期刊:
  • 影响因子:
    4.4
  • 作者:
    Burgette,LaneF;Reiter,JeromeP
  • 通讯作者:
    Reiter,JeromeP
Imputation of confidential data sets with spatial locations using disease mapping models.
  • DOI:
    10.1002/sim.6078
  • 发表时间:
    2014-05-20
  • 期刊:
  • 影响因子:
    2
  • 作者:
    Paiva, Thais;Chakraborty, Avishek;Reiter, Jerry;Gelfand, Alan
  • 通讯作者:
    Gelfand, Alan
MULTIPLE IMPUTATION FOR SHARING PRECISE GEOGRAPHIES IN PUBLIC USE DATA.
用于共享公共使用数据中的精确地理信息的多重插补。
  • DOI:
    10.1214/11-aoas506
  • 发表时间:
    2012
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Wang,Hao;Reiter,JeromeP
  • 通讯作者:
    Reiter,JeromeP
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Jerome Phillip Reiter其他文献

Jerome Phillip Reiter的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

Rational design of rapidly translatable, highly antigenic and novel recombinant immunogens to address deficiencies of current snakebite treatments
合理设计可快速翻译、高抗原性和新型重组免疫原,以解决当前蛇咬伤治疗的缺陷
  • 批准号:
    MR/S03398X/2
  • 财政年份:
    2024
  • 资助金额:
    $ 19万
  • 项目类别:
    Fellowship
CAREER: FEAST (Food Ecosystems And circularity for Sustainable Transformation) framework to address Hidden Hunger
职业:FEAST(食品生态系统和可持续转型循环)框架解决隐性饥饿
  • 批准号:
    2338423
  • 财政年份:
    2024
  • 资助金额:
    $ 19万
  • 项目类别:
    Continuing Grant
Re-thinking drug nanocrystals as highly loaded vectors to address key unmet therapeutic challenges
重新思考药物纳米晶体作为高负载载体以解决关键的未满足的治疗挑战
  • 批准号:
    EP/Y001486/1
  • 财政年份:
    2024
  • 资助金额:
    $ 19万
  • 项目类别:
    Research Grant
Metrology to address ion suppression in multimodal mass spectrometry imaging with application in oncology
计量学解决多模态质谱成像中的离子抑制问题及其在肿瘤学中的应用
  • 批准号:
    MR/X03657X/1
  • 财政年份:
    2024
  • 资助金额:
    $ 19万
  • 项目类别:
    Fellowship
CRII: SHF: A Novel Address Translation Architecture for Virtualized Clouds
CRII:SHF:一种用于虚拟化云的新型地址转换架构
  • 批准号:
    2348066
  • 财政年份:
    2024
  • 资助金额:
    $ 19万
  • 项目类别:
    Standard Grant
The Abundance Project: Enhancing Cultural & Green Inclusion in Social Prescribing in Southwest London to Address Ethnic Inequalities in Mental Health
丰富项目:增强文化
  • 批准号:
    AH/Z505481/1
  • 财政年份:
    2024
  • 资助金额:
    $ 19万
  • 项目类别:
    Research Grant
ERAMET - Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
ERAMET - 快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10107647
  • 财政年份:
    2024
  • 资助金额:
    $ 19万
  • 项目类别:
    EU-Funded
BIORETS: Convergence Research Experiences for Teachers in Synthetic and Systems Biology to Address Challenges in Food, Health, Energy, and Environment
BIORETS:合成和系统生物学教师的融合研究经验,以应对食品、健康、能源和环境方面的挑战
  • 批准号:
    2341402
  • 财政年份:
    2024
  • 资助金额:
    $ 19万
  • 项目类别:
    Standard Grant
Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10106221
  • 财政年份:
    2024
  • 资助金额:
    $ 19万
  • 项目类别:
    EU-Funded
Recite: Building Research by Communities to Address Inequities through Expression
背诵:社区开展研究,通过表达解决不平等问题
  • 批准号:
    AH/Z505341/1
  • 财政年份:
    2024
  • 资助金额:
    $ 19万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了