ITR-(ECS+ASE)-(dmc+int): Info Tech Challenges for Secure Access to Confidential Social Science Data
ITR-(ECS ASE)-(dmc int):安全访问机密社会科学数据的信息技术挑战
基本信息
- 批准号:0427889
- 负责人:
- 金额:$ 293.8万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2004
- 资助国家:美国
- 起止时间:2004-10-01 至 2009-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Census Research Data Centers (RDCs), based in Ann Arbor, Berkeley, Boston, Chicago, Durham, Ithaca, Los Angeles, New York City, and Washington provide approved scientists with access to confidential Census data for research that directly benefits both the Census Bureau and society. The RDC directors, administrators, board members and researchers, together with the Center for Economic Studies and the Longitudinal Employer-Household Dynamics (LEHD) Program, constitute a collaborative research network that is building and supporting a secure distributed computer network that enables research that is critical to our economic and civic prosperity and security. The network operates under physical security constraints dictated by Census and the Internal Revenue Service. The constraints essentially eliminate the possibility of distributing the computations to facilities outside of the Bureau's main computing facility. Instead, the researchers use the RDCs as supervised remote access facilities that provide a secure, encrypted connection to the RDC computing network. This project addresses the technical and logistical issues raised by the creation, maintenance, and growth of the RDC network while maintaining the confidentiality guaranteed to participants in Census data. The RDCs and LEHD will lead a new wave of research with the development of innovative, large-scale linked data products that integrate Census Bureau surveys, censuses and administrative records with data from state governments and surveys conducted by private institutions. Both CES and LEHD have extensive experience in creating these products. The RDC network researchers will enhance that experience and contribute their own expertise to the data linking research. The newly created data will be richer than any presently available to researchers with no increase in respondent burden. They will also raise complicated and vexing issues regarding disclosure avoidance and participant privacy.The project also creates synthetic versions of these confidential data sets. This will increase the accessibility of these data to social science researchers while preserving the confidentiality of private information. Synthetic and partially-synthetic data are new confidentiality protection techniques that rely on computationally intensive sampling from the posterior predictive distribution of the underlying confidential data. The result is micro-data that preserve important analytical properties of the original data and are, thus, inference-valid. The synthetic versions of confidential data are for public use. At the same time, ongoing research within the RDCs using the gold-standard confidential data will constantly test the quality of the synthesized data and allow for continuous improvement. As a result a continuous feedback relationship will be established between the research activities conducted in RDCs on confidential Census Bureau data and the quality of the Bureau's public use data products-namely, the synthetic micro data created by these projects. In order to accomplish these computationally-intensive activities, as well as to allow researchers to engage in such innovative research as agent-based simulations and geo-spatial analysis, we will install a supercluster of SMP nodes optimized for the applications of creating linked data, analyzing the gold-standard data, and processing the data to produce multiply-synthesized public use data sets. Two industry partners, Intel and Unisys, have promised to directly support the creation of this supercluster by donating 256 Itanium 2 processors and providing the computing crossbars, cluster infrastructure, and disk storage arrays at manufacturer's cost. The Linux-based system will be integrated and tuned by the proposal team from Argonne National Laboratories. The synthetic data specialists on the proposal team will port existing multi-threaded data synthesizers and develop new ones.Broader Impacts: The research conducted in RDCs and at LEHD over the past decade has made important contributions to our understanding of essential social, economic, and environmental issues that would not have been possible without use of the confidential data accessible via the RDC network. It is difficult to overstate the significance of this research, which has used more than 30 years of longitudinally integrated establishment micro-data from the Census Business Register and Economic Censuses; confidential micro-data from all the major Census surveys (Current Population Survey, Survey of Income and Program Participation, American Housing Survey), confidential micro-data from the Decennial Censuses of Population in 1990 and 2000; longitudinally integrated Unemployment Insurance wage records, ES-202 establishment data, and Social Security Administration data; federal tax information linked to major surveys; environmental data on air quality linked to Business Register and Economic Census data; Medicaid data linked to the Survey of Income and Program Participation; and many others.
位于安阿伯、伯克利、波士顿、芝加哥、达勒姆、伊萨卡、洛杉矶、纽约市和华盛顿的人口普查研究数据中心(RDC)为获得批准的科学家提供访问机密人口普查数据的机会,以进行直接有益于人口普查局和社会的研究。RDC董事,管理员,董事会成员和研究人员,以及经济研究中心和纵向雇主-家庭动态(LEHD)计划,构成了一个合作研究网络,正在建立和支持一个安全的分布式计算机网络,使研究对我们的经济和公民繁荣和安全至关重要。 该网络在人口普查局和国内税务局规定的物理安全限制下运作。这些限制基本上消除了将计算分配给该局主要计算设施以外的设施的可能性。相反,研究人员使用RDC作为受监督的远程访问设施,提供与RDC计算网络的安全加密连接。 该项目解决了RDC网络的创建、维护和增长所带来的技术和后勤问题,同时保持对人口普查数据参与者的保密性。 RDCs和LEHD将引领新一轮研究浪潮,开发创新的大规模关联数据产品,将人口普查局的调查、人口普查和行政记录与州政府的数据和私营机构进行的调查相结合。CES和LEHD在创造这些产品方面都有丰富的经验。RDC网络研究人员将加强这方面的经验,并为数据链接研究贡献自己的专业知识。新创建的数据将比研究人员目前可用的任何数据都更丰富,而不会增加受访者的负担。他们还将提出关于避免披露和参与者隐私的复杂和令人烦恼的问题。该项目还创建了这些机密数据集的合成版本。 这将增加社会科学研究人员获得这些数据的机会,同时保护私人信息的机密性。合成和部分合成数据是新的机密性保护技术,其依赖于从底层机密数据的后验预测分布进行计算密集型采样。其结果是微观数据,保留了原始数据的重要分析属性,因此是推理有效的。机密数据的合成版本供公众使用。与此同时,区域数据中心正在使用黄金标准机密数据进行研究,将不断测试综合数据的质量,并允许不断改进。 因此,将建立一个持续的反馈关系之间进行的研究活动,在RDCs的人口普查局的机密数据和质量的局的公共使用的数据产品,即这些项目创建的合成微观数据。为了完成这些计算密集型活动,以及允许研究人员从事基于代理的模拟和地理空间分析等创新研究,我们将安装一个SMP节点的超级集群,用于创建链接数据,分析黄金标准数据和处理数据以产生多重合成的公共使用数据集。英特尔和Unisys这两个行业合作伙伴承诺直接支持这个超级集群的创建,捐赠256个Itanium 2处理器,并以制造商的成本提供计算交叉开关、集群基础设施和磁盘存储阵列。基于Linux的系统将由来自阿贡国家实验室的提案小组进行集成和调整。提案团队的合成数据专家将移植现有的多线程数据合成器并开发新的。更广泛的影响:在过去十年中,在RDC和LEHD进行的研究为我们理解重要的社会、经济和环境问题做出了重要贡献,如果没有使用通过RDC网络访问的机密数据,这些问题是不可能实现的。这项研究使用了30多年来从普查、商业登记和经济普查中纵向综合的机构微观数据,其重要性怎么强调都不过分;所有主要人口普查的机密微观数据(当前人口调查,收入和计划参与调查,美国住房调查),1990年和2000年十年人口普查的机密微观数据;纵向综合失业保险工资记录、ES-202机构数据和社会保障管理局数据;与主要调查有关的联邦税收信息;与商业登记和经济普查数据有关的空气质量环境数据;与收入和计划参与调查有关的医疗补助数据;以及许多其他数据。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
John Abowd其他文献
Data Quality Issues
数据质量问题
- DOI:
- 发表时间:
2015 - 期刊:
- 影响因子:0
- 作者:
John Abowd;J. Haltiwanger;Bryce E. Stephens - 通讯作者:
Bryce E. Stephens
Differentially Private Methods for Validation Servers
验证服务器的差异私有方法
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Andrés F. Barrientos;Aaron R. Williams;C. Bowen;John Abowd;Jim Cilke;J. Debacker;Nada Eissa;Rick Evans;Dan Feenberg;Max Ghenis;Nick Hart;Matt Jensen;Barry Johnson;I. Lurie;Shelly Martinez;Robert Moffitt;Amy O’Hara;Jerry Reiter;Emmanuel Saez;Wade Shen;Aleksandra Slavković;Salil P. Vadhan;Lars Vilhuber IV Acknowledgments - 通讯作者:
Lars Vilhuber IV Acknowledgments
John Abowd的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('John Abowd', 18)}}的其他基金
TC: Large: Collaborative Research: Practical Privacy: Metrics and Methods for Protecting Record-level and Relational Data
TC:大型:协作研究:实用隐私:保护记录级和关系数据的指标和方法
- 批准号:
1012593 - 财政年份:2010
- 资助金额:
$ 293.8万 - 项目类别:
Continuing Grant
CDI-Type II: Collaborative Research: Integrating Statistical and Computational Approaches to Privacy
CDI-类型 II:协作研究:整合隐私统计和计算方法
- 批准号:
0941226 - 财政年份:2010
- 资助金额:
$ 293.8万 - 项目类别:
Standard Grant
Joint NSF-Census-IRS Workshop on synthetic data and confidentiality protection, July 2009 Washington, DC
NSF-人口普查-IRS 合成数据和机密性保护联合研讨会,2009 年 7 月华盛顿特区
- 批准号:
0922494 - 财政年份:2009
- 资助金额:
$ 293.8万 - 项目类别:
Standard Grant
EITM: Developing the Tools to Understand Human Performance: An Empirical Infrastructure to Foster Research Collaboration
EITM:开发了解人类绩效的工具:促进研究合作的实证基础设施
- 批准号:
0339191 - 财政年份:2004
- 资助金额:
$ 293.8万 - 项目类别:
Standard Grant
Dynamic Employer-Household Data and the Social Data Infrastructure
动态雇主家庭数据和社交数据基础设施
- 批准号:
9978093 - 财政年份:1999
- 资助金额:
$ 293.8万 - 项目类别:
Continuing Grant
Individual and Firm Heterogeneity in Labor Markets: Studies of Matched Employee-Employer Data
劳动力市场中的个人和企业异质性:匹配雇员-雇主数据的研究
- 批准号:
9618111 - 财政年份:1997
- 资助金额:
$ 293.8万 - 项目类别:
Continuing Grant
Employment and Compensation Policies: Studies of American and French Labor Markets Using Matched Employer-Employee Data
就业和薪酬政策:使用匹配的雇主-雇员数据对美国和法国劳动力市场进行研究
- 批准号:
9321053 - 财政年份:1994
- 资助金额:
$ 293.8万 - 项目类别:
Continuing Grant
Compensation System Design, Employment and Firm Performance:An Analyis of French Microdata and a Comparison to the U.S.A
薪酬体系设计、就业与企业绩效:法国微观数据分析及与美国的比较
- 批准号:
9111186 - 财政年份:1991
- 资助金额:
$ 293.8万 - 项目类别:
Continuing grant
The Effects of Collective Bargaining and Threats of Unionization on Firm Investment Policy, Return on Investment, and Stock Valuations
集体谈判和工会威胁对公司投资政策、投资回报和股票估值的影响
- 批准号:
8813847 - 财政年份:1988
- 资助金额:
$ 293.8万 - 项目类别:
Continuing Grant
Improving the Scientific Research Utility of Labor Force Gross Flow Data
提高劳动力总流量数据的科研效用
- 批准号:
8513700 - 财政年份:1986
- 资助金额:
$ 293.8万 - 项目类别:
Standard Grant
相似海外基金
ITR - (ECS+ASE) - (dmc+soc): Automated Mechanism Design
ITR - (ECS ASE) - (dmc soc):自动化机构设计
- 批准号:
0427858 - 财政年份:2004
- 资助金额:
$ 293.8万 - 项目类别:
Continuing Grant
ITR - (ECS+ASE+NHS) - (dmc): Richer Understanding of the Complexity of Election Systems
ITR - (ECS ASE NHS) - (dmc):对选举系统复杂性的更深入了解
- 批准号:
0426761 - 财政年份:2004
- 资助金额:
$ 293.8万 - 项目类别:
Continuing Grant