Methodology for Improving Public Use Data Dissemination Via Multiply-Imputed, Partially Synthetic Data
通过多重插补、部分合成数据改进公共使用数据传播的方法
基本信息
- 批准号:0751671
- 负责人:
- 金额:$ 18万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2008
- 资助国家:美国
- 起止时间:2008-06-01 至 2011-05-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Statistical agencies and other organizations that disseminate data to the public are ethically and often legally required to protect the confidentiality of respondents' identities and sensitive attributes. To satisfy these requirements, agencies can release multiply-imputed, partially synthetic data. These comprise the units originally surveyed with some values, such as sensitive values at high risk of disclosure or values of key identifiers, replaced with multiple imputations. This research improves the risk-utility profile of partially synthetic data approaches by addressing four key issues in their implementation. First, the research develops methods for quantifying identification disclosure risks for partially synthetic data sets. These measures account for (i) the information existing in all the synthetic data sets, (ii) various assumptions about intruder knowledge and behavior, and (iii) the details released about the synthetic data generation model. This information is crucial to data producers seeking to evaluate the protection afforded by synthetic data. Second, the research provides strategies that data producers can use to select values to synthesize. The strategies optimize the trade-offs between risk and utility for candidate sets of values. Third, the research yields strategies for selecting synthetic data sets. For example, the data producer can throw out synthetic data sets that are too high in disclosure risk or too low in data utility. The research produces guidelines for how such selection impacts inferences made using existing methods, and it develops appropriate methods of inference for situations where the effects of selection are substantial. Finally, the research develops flexible, nonparametric modeling strategies for synthetic data generation based on techniques from machine learning. This improves the analytic validity of partially synthetic data approaches.This research provides federal agencies, survey organizations, research centers, and other data producers with more and better options for public use data dissemination than exist at present. As resources available to malicious data users continue to expand, the alterations needed to protect public use data with traditional disclosure limitation techniques---such as swapping data values, adding random noise, or aggregating data---may become so extreme that, for many analyses, the released data are no longer useful. Synthetic data, on the other hand, have the potential to enable public use data dissemination while preserving data utility. Ultimately, with higher quality public use data, secondary data analysts can make more and better inferences, leading to deeper understanding of social science and policy questions.
向公众传播数据的统计机构和其他组织在道德上和法律上往往需要保护答卷人身份和敏感属性的机密性。 为了满足这些要求,各机构可以发布多重估算的、部分合成的数据。 这些单位包括原先用某些数值进行调查的单位,例如被多重估算所取代的高披露风险的敏感数值或关键标识的数值。本研究通过解决实施过程中的四个关键问题,提高了部分合成数据方法的风险效用。 首先,本研究开发了量化部分合成数据集识别披露风险的方法。 这些措施占(i)存在于所有的合成数据集的信息,(ii)各种假设入侵者的知识和行为,以及(iii)发布的合成数据生成模型的细节。 这些信息对于试图评估合成数据所提供的保护的数据生产者至关重要。 其次,研究提供了数据生产者可以用来选择要合成的值的策略。 这些策略优化了候选值集的风险和效用之间的权衡。第三,该研究产生了选择合成数据集的策略。 例如,数据生产者可以丢弃披露风险过高或数据效用过低的合成数据集。 该研究产生的指导方针,这种选择如何影响使用现有的方法作出的推断,并制定适当的推理方法的情况下,选择的影响是巨大的。 最后,研究开发灵活的,非参数化的建模策略,用于基于机器学习技术的合成数据生成。 本研究为联邦机构、调查组织、研究中心和其他数据生产者提供了比目前更多更好的公共使用数据传播选择。 随着恶意数据用户可用资源的不断扩展,使用传统的披露限制技术保护公共使用数据所需的更改-例如交换数据值,添加随机噪声或聚合数据-可能变得如此极端,以至于对于许多分析来说,发布的数据不再有用。 另一方面,合成数据有可能使公众能够使用数据传播,同时保持数据效用。 最终,有了更高质量的公共使用数据,二级数据分析师可以做出更多更好的推断,从而加深对社会科学和政策问题的理解。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Jerome Reiter其他文献
The impact of lead and other exposures on early school performance
- DOI:
10.1016/j.ntt.2008.03.018 - 发表时间:
2008-05-01 - 期刊:
- 影响因子:
- 作者:
Jerome Reiter;Dohyeong Kim;Andy Hull;Marie Lynn Miranda - 通讯作者:
Marie Lynn Miranda
Jerome Reiter的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Jerome Reiter', 18)}}的其他基金
Enhancing Synthetic Data Techniques for Practical Applications
增强实际应用的综合数据技术
- 批准号:
2217456 - 财政年份:2022
- 资助金额:
$ 18万 - 项目类别:
Standard Grant
Leveraging Auxiliary Information on Marginal Distributions in Multiple Imputation for Survey Nonresponse
利用多重插补中边际分布的辅助信息来解决调查无答复问题
- 批准号:
1733835 - 财政年份:2017
- 资助金额:
$ 18万 - 项目类别:
Standard Grant
CIF21 DIBBs: An Integrated System for Public/Private Access to Large-Scale, Confidential Social Science Data
CIF21 DIBB:公共/私人访问大规模、机密社会科学数据的集成系统
- 批准号:
1443014 - 财政年份:2015
- 资助金额:
$ 18万 - 项目类别:
Standard Grant
NCRN-MN: Triangle Census Research Network
NCRN-MN:三角人口普查研究网络
- 批准号:
1131897 - 财政年份:2011
- 资助金额:
$ 18万 - 项目类别:
Standard Grant
Multiple Imputation Methods for Handling Missing Data in Longitudinal Studies with Refreshment Samples
处理更新样本纵向研究中缺失数据的多重插补方法
- 批准号:
1061241 - 财政年份:2011
- 资助金额:
$ 18万 - 项目类别:
Standard Grant
TC: Large: Collaborative Research: Practical Privacy: Metrics and Methods for Protecting Record-level and Relational Data
TC:大型:协作研究:实用隐私:保护记录级和关系数据的指标和方法
- 批准号:
1012141 - 财政年份:2010
- 资助金额:
$ 18万 - 项目类别:
Continuing Grant
相似国自然基金
Improving modelling of compact binary evolution.
- 批准号:10903001
- 批准年份:2009
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
相似海外基金
A National Multidisciplinary Priority-Setting Summit: Grab bar installation as a public health solution for preventing falls, reducing injury and improving bathroom accessibility for older adults in Canada
全国多学科优先事项峰会:安装扶手作为公共卫生解决方案,可预防跌倒、减少伤害并改善加拿大老年人的卫生间无障碍环境
- 批准号:
480820 - 财政年份:2023
- 资助金额:
$ 18万 - 项目类别:
Miscellaneous Programs
Improving Public Awareness and Driving Multi-level Action on the Causes of Poverty to Support Financial Wellbeing
提高公众意识并推动针对贫困根源的多层次行动以支持金融福祉
- 批准号:
485645 - 财政年份:2023
- 资助金额:
$ 18万 - 项目类别:
Miscellaneous Programs
NSF-SSRC: An Intention-Action Framework for Improving the Impact of Public Health Initiatives
NSF-SSRC:提高公共卫生举措影响力的意向行动框架
- 批准号:
2317430 - 财政年份:2023
- 资助金额:
$ 18万 - 项目类别:
Continuing Grant
SBIR Phase II: Improving fleet operational metrics through service optimization with automated learning of vehicle energy performance models for zero-emission public transport
SBIR 第二阶段:通过服务优化和自动学习零排放公共交通的车辆能源性能模型来改善车队运营指标
- 批准号:
2220811 - 财政年份:2023
- 资助金额:
$ 18万 - 项目类别:
Cooperative Agreement
Improving public engagement for maximised benefits of forest and woodland expansion and creation
提高公众参与度,实现森林和林地扩张和创造效益最大化
- 批准号:
NE/Y004167/1 - 财政年份:2023
- 资助金额:
$ 18万 - 项目类别:
Research Grant
Improving flexibility and performance of the Acute Care Enhanced Surveillance (ACES) System for public health surveillance: an ensemble of state-of-the-art machine learning and rule-based natural language processing methods
提高用于公共卫生监测的急性护理增强监测 (ACES) 系统的灵活性和性能:最先进的机器学习和基于规则的自然语言处理方法的集合
- 批准号:
468864 - 财政年份:2022
- 资助金额:
$ 18万 - 项目类别:
Operating Grants
Statistical Methods for Improving Real-Time Public Health Surveillance and Integrated Outbreak Detection
改进实时公共卫生监测和综合疫情检测的统计方法
- 批准号:
10682401 - 财政年份:2022
- 资助金额:
$ 18万 - 项目类别:
Infectious Disease Genomic Contextual Data Harmonization: Improving Public Health Investigations via User-Engagement, Ontologies, and Open Data Specifications
传染病基因组背景数据协调:通过用户参与、本体论和开放数据规范改进公共卫生调查
- 批准号:
475749 - 财政年份:2022
- 资助金额:
$ 18万 - 项目类别:
Studentship Programs
Improving public awareness of the vulnerability of fingerprint authentication by establishing methods for evaluating presentation attacks and detection performance
通过建立评估演示攻击和检测性能的方法,提高公众对指纹认证漏洞的认识
- 批准号:
22K21314 - 财政年份:2022
- 资助金额:
$ 18万 - 项目类别:
Grant-in-Aid for Research Activity Start-up
Improving public transport ridership through a better understanding of transport accessibility, transit service quality and users' perceptions
通过更好地了解交通可达性、交通服务质量和用户的看法来提高公共交通乘客量
- 批准号:
RGPIN-2019-06032 - 财政年份:2022
- 资助金额:
$ 18万 - 项目类别:
Discovery Grants Program - Individual














{{item.name}}会员




