APPLYING RESAMPLING TECHNIQUES TO LARGE DATA SETS
将重采样技术应用于大型数据集
基本信息
- 批准号:6140490
- 负责人:
- 金额:$ 36.57万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:1999
- 资助国家:美国
- 起止时间:1999-01-07 至 2002-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This project will investigate the feasibility and merit of applying bootstrapping and similar resampling strategies to the analysis of relatively large census and survey microdata files. While bootstrapping has in general been applied most fruitfully to small sample research designs, new technology now allows resampling and bootstrapping to be effectively applied to much larger data sets than have been previously analyzed using the techniques. In particular, we will focus on two aims: (l) determining confidence intervals for frequency counts, percentages, and summary statistics for basic multivariate analyses from large census and survey data files; and (2) assessing the potential for resampling techniques to assist in masking sensitive information extracted from data sets in which confidentiality of the respondents (disclosure avoidance) is an important concern and where minimal perturbing of the data is desired. A computational tool utilizing an existing parallel high performance computing environment and optimized for resampling will be created to facilitate the implementation and testing of resampling techniques such as bootstrapping on data sets of 10,000-50,000 records. PROPOSED COMMERCIAL APPLICATION: Incorporating resampling into our own information system, PDQ-Explore, will increase its value to data users in the fields of social science, health care, community services, and commercial information. Licensing the software to other information providers who need confidence intervals will broaden our customer base. Protecting confidentiality will allow us to tap more data sources and make more data sets available to more users, in research, education, government, and commerce.
该项目将调查在分析较大的人口普查和调查微数据文件时采用自举和类似的重抽样战略的可行性和优点。虽然自举通常最有效地应用于小样本研究设计,但新技术现在允许重新采样和自举有效地应用于比以前使用该技术分析的大得多的数据集。我们将特别集中于两个目标:(L)从大型人口普查和调查数据文件中确定基本多变量分析的频率计数、百分比和汇总统计的可信区间;以及(2)评估重采样技术的可能性,以帮助掩盖从数据集中提取的敏感信息,在这些数据中,受访者的机密性(避免披露)是一个重要问题,并且希望数据受到最小程度的干扰。将创建一个利用现有并行高性能计算环境并针对重采样进行优化的计算工具,以促进重采样技术的实施和测试,例如对10,000-50,000条记录的数据集进行引导。拟议的商业应用:将重采样纳入我们自己的信息系统PDQ-Explore,将增加其在社会科学、医疗保健、社区服务和商业信息领域的数据用户的价值。将软件授权给需要信任间隔的其他信息提供商将扩大我们的客户基础。保护机密性将使我们能够利用更多的数据源,并使更多的数据集可供更多用户使用,包括研究、教育、政府和商业。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
ALBERT F ANDERSON其他文献
ALBERT F ANDERSON的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('ALBERT F ANDERSON', 18)}}的其他基金
DISTRIBUTION AND SUPPORT OF THE CENSUS SUPER SAMPLE
人口普查超级样本的分发和支持
- 批准号:
2867614 - 财政年份:1999
- 资助金额:
$ 36.57万 - 项目类别:
APPLYING RESAMPLING TECHNIQUES TO LARGE DATA SETS
将重采样技术应用于大型数据集
- 批准号:
2790415 - 财政年份:1999
- 资助金额:
$ 36.57万 - 项目类别:
DISTRIBUTION AND SUPPORT OF THE CENSUS SUPER SAMPLE
人口普查超级样本的分发和支持
- 批准号:
6388106 - 财政年份:1999
- 资助金额:
$ 36.57万 - 项目类别:
DISTRIBUTION AND SUPPORT OF THE CENSUS SUPER SAMPLE
人口普查超级样本的分发和支持
- 批准号:
6697653 - 财政年份:1999
- 资助金额:
$ 36.57万 - 项目类别:
APPLYING RESAMPLING TECHNIQUES TO LARGE DATA SETS
将重采样技术应用于大型数据集
- 批准号:
6388051 - 财政年份:1999
- 资助金额:
$ 36.57万 - 项目类别:
DISTRIBUTION AND SUPPORT OF THE CENSUS SUPER SAMPLE
人口普查超级样本的分发和支持
- 批准号:
6209960 - 财政年份:1999
- 资助金额:
$ 36.57万 - 项目类别: