APPLYING RESAMPLING TECHNIQUES TO LARGE DATA SETS
将重采样技术应用于大型数据集
基本信息
- 批准号:6388051
- 负责人:
- 金额:$ 35.04万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:1999
- 资助国家:美国
- 起止时间:1999-01-07 至 2003-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This project will investigate the feasibility and merit of applying bootstrapping and similar resampling strategies to the analysis of relatively large census and survey microdata files. While bootstrapping has in general been applied most fruitfully to small sample research designs, new technology now allows resampling and bootstrapping to be effectively applied to much larger data sets than have been previously analyzed using the techniques. In particular, we will focus on two aims: (l) determining confidence intervals for frequency counts, percentages, and summary statistics for basic multivariate analyses from large census and survey data files; and (2) assessing the potential for resampling techniques to assist in masking sensitive information extracted from data sets in which confidentiality of the respondents (disclosure avoidance) is an important concern and where minimal perturbing of the data is desired. A computational tool utilizing an existing parallel high performance computing environment and optimized for resampling will be created to facilitate the implementation and testing of resampling techniques such as bootstrapping on data sets of 10,000-50,000 records. PROPOSED COMMERCIAL APPLICATION: Incorporating resampling into our own information system, PDQ-Explore, will increase its value to data users in the fields of social science, health care, community services, and commercial information. Licensing the software to other information providers who need confidence intervals will broaden our customer base. Protecting confidentiality will allow us to tap more data sources and make more data sets available to more users, in research, education, government, and commerce.
本项目将调查将自举和类似的再分析战略应用于分析相对较大的普查和调查微观数据文件的可行性和优点。虽然自举一般已被最富有成效地应用于小样本的研究设计,新技术,现在允许restrobe和自举被有效地应用到更大的数据集比以前使用的技术分析。特别是,我们将集中在两个目标:(l)确定置信区间的频率计数,百分比,并从大型人口普查和调查数据文件的基本多变量分析汇总统计;以及(2)评估重新加密技术的潜力,以帮助掩盖从数据集中提取的敏感信息,其中受访者的机密性(避免披露)是一个重要的问题,并且需要对数据进行最小的扰动。将创建一个计算工具,利用现有的并行高性能计算环境,并为恢复进行优化,以促进恢复技术的实施和测试,例如在10,000 - 50,000个记录的数据集上进行引导。拟定商业应用:将数据库纳入我们自己的信息系统PDQ-Explore,将增加其对社会科学、医疗保健、社区服务和商业信息领域数据用户的价值。将软件授权给需要置信区间的其他信息提供商将扩大我们的客户群。保护机密性将使我们能够挖掘更多的数据源,并将更多的数据集提供给研究、教育、政府和商业领域的更多用户。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
ALBERT F ANDERSON其他文献
ALBERT F ANDERSON的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('ALBERT F ANDERSON', 18)}}的其他基金
DISTRIBUTION AND SUPPORT OF THE CENSUS SUPER SAMPLE
人口普查超级样本的分发和支持
- 批准号:
2867614 - 财政年份:1999
- 资助金额:
$ 35.04万 - 项目类别:
APPLYING RESAMPLING TECHNIQUES TO LARGE DATA SETS
将重采样技术应用于大型数据集
- 批准号:
2790415 - 财政年份:1999
- 资助金额:
$ 35.04万 - 项目类别:
DISTRIBUTION AND SUPPORT OF THE CENSUS SUPER SAMPLE
人口普查超级样本的分发和支持
- 批准号:
6388106 - 财政年份:1999
- 资助金额:
$ 35.04万 - 项目类别:
DISTRIBUTION AND SUPPORT OF THE CENSUS SUPER SAMPLE
人口普查超级样本的分发和支持
- 批准号:
6697653 - 财政年份:1999
- 资助金额:
$ 35.04万 - 项目类别:
DISTRIBUTION AND SUPPORT OF THE CENSUS SUPER SAMPLE
人口普查超级样本的分发和支持
- 批准号:
6209960 - 财政年份:1999
- 资助金额:
$ 35.04万 - 项目类别:
APPLYING RESAMPLING TECHNIQUES TO LARGE DATA SETS
将重采样技术应用于大型数据集
- 批准号:
6140490 - 财政年份:1999
- 资助金额:
$ 35.04万 - 项目类别: