Collaborative Research: Information-Based Subdata Selection Inspired by Optimal Design of Experiments

协作研究:受实验优化设计启发的基于信息的子数据选择

基本信息

  • 批准号:
    1935729
  • 负责人:
  • 金额:
    $ 6万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2019
  • 资助国家:
    美国
  • 起止时间:
    2019-05-10 至 2022-06-30
  • 项目状态:
    已结题

项目摘要

Extraordinary amounts of data are collected in many branches of science, in industry, and in government. The massive amounts of data provide incredible opportunities for making knowledge-based decisions and for advancing complicated research problems through data-driven discoveries. To capitalize on these opportunities, it is critical to develop methodology that facilitates the extraction of useful information from massive data in a computationally efficient way. Even the simplest analyses of the data can be computationally intensive or may no longer be feasible for big data. It is however often the case that valid conclusions can be drawn by considering only some of the data, referred to as subdata. This project develops optimal strategies for selecting subdata that retain, as much as possible, relevant information that was available in the massive data set. The methodology helps to identify the most informative data points, after which an analysis can proceed based on the selected subdata only. This facilitates data-driven decisions, scientific discoveries, and technological breakthroughs with computing resources that are readily available. Existing investigations for extracting information from big data with common computing power have focused on random subsampling-based approaches, which have as limitation that the amount of information extracted is only scalable to the subdata size, not the full data size. This project develops and expands the Information-Based Optimal Subdata Selection (IBOSS) method proposed by the PIs in the following directions: 1) It combines IBOSS with sparse variable selection methods in linear regression; 2) it develops subdata selection methods for generalized linear models; 3) it constructs computationally efficient algorithms for selecting the most informative subdata; and 4) it develops user-friendly software that supports the methodology. The research is a significant addition to the field of big data science. It advances a new method for dealing with big data and has the potential to create novel research opportunities in statistical science and other quantitative fields. The results are valuable even when supercomputers are available, because cutting edge high performance computing facilities will always trail the exponential growth of data volume.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
在科学的许多分支、工业和政府中收集了大量的数据。大量的数据为做出基于知识的决策和通过数据驱动的发现推进复杂的研究问题提供了令人难以置信的机会。为了利用这些机会,关键是要开发一种方法,以便于以计算效率高的方式从海量数据中提取有用的信息。即使是最简单的数据分析也可能是计算密集型的,或者可能不再适用于大数据。然而,通常情况下,只考虑部分数据(称为子数据)就可以得出有效的结论。该项目开发了选择子数据的最佳策略,这些子数据尽可能多地保留了海量数据集中可用的相关信息。该方法有助于确定信息量最大的数据点,然后仅根据选定的子数据进行分析。这有助于数据驱动的决策,科学发现和技术突破,以及随时可用的计算资源。现有的从具有普通计算能力的大数据中提取信息的研究集中在基于随机子采样的方法上,其局限性在于提取的信息量仅可扩展到子数据大小,而不是完整的数据大小。本计画发展及扩充由PI所提出的基于信息的最佳子数据选择(Information-Based Optimal Subdata Selection,IBOSS)方法,其方向如下:1)将IBOSS与线性回归中的稀疏变量选择方法结合; 2)发展广义线性模型的子数据选择方法; 3)建构计算效率高的算法,以选择最具信息量的子数据;以及4)它开发了支持该方法的用户友好软件。 这项研究是对大数据科学领域的重要补充。它提出了一种处理大数据的新方法,并有可能在统计科学和其他定量领域创造新的研究机会。即使在超级计算机可用的情况下,结果也是有价值的,因为尖端的高性能计算设施总是会跟踪数据量的指数增长。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(11)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Selection of Two-Level Supersaturated Designs for Main Effects Models
主效应模型的两水平过饱和设计的选择
  • DOI:
    10.1080/00401706.2022.2102080
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    2.5
  • 作者:
    Singh, Rakhi;Stufken, John
  • 通讯作者:
    Stufken, John
Efficient orthogonal functional magnetic resonance imaging designs in the presence of drift
存在漂移时的高效正交功能磁共振成像设计
Comments on: Data science, big data and statistics
评论:数据科学、大数据和统计学
  • DOI:
    10.1007/s11749-019-00643-9
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    1.3
  • 作者:
    Nachtsheim, Abigael C.;Stufken, John
  • 通讯作者:
    Stufken, John
Standing on the Shoulders of a Giant: The Life and Work of Samad Hedayat
站在巨人的肩膀上:Samad Hedayat 的生活和工作
Locally D-optimal Designs for Binary Responses in the Presence of Factorial Effects
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

John Stufken其他文献

Variance Approximation Under Balanced Sampling Plans Excluding Adjacent Units
Approximations of the information matrix for a panel mixed logit model

John Stufken的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('John Stufken', 18)}}的其他基金

Collaborative Research: Design-Based Optimal Subdata Selection Using Mixture-of-Experts Models to Account for Big Data Heterogeneity
协作研究:基于设计的最佳子数据选择,使用专家混合模型来解释大数据异构性
  • 批准号:
    2210576
  • 财政年份:
    2022
  • 资助金额:
    $ 6万
  • 项目类别:
    Standard Grant
Collaborative Research: Design-Based Optimal Subdata Selection Using Mixture-of-Experts Models to Account for Big Data Heterogeneity
协作研究:基于设计的最佳子数据选择,使用专家混合模型来解释大数据异构性
  • 批准号:
    2304767
  • 财政年份:
    2022
  • 资助金额:
    $ 6万
  • 项目类别:
    Standard Grant
Collaborative Research: Information-Based Subdata Selection Inspired by Optimal Design of Experiments
协作研究:受实验优化设计启发的基于信息的子数据选择
  • 批准号:
    1811363
  • 财政年份:
    2018
  • 资助金额:
    $ 6万
  • 项目类别:
    Standard Grant
Collaborative research: A major leap forward: Optimal designs for correlated data, multiple objectives, and multiple covariates
协作研究:重大飞跃:相关数据、多目标和多协变量的优化设计
  • 批准号:
    1506125
  • 财政年份:
    2014
  • 资助金额:
    $ 6万
  • 项目类别:
    Continuing Grant
Collaborative research: A major leap forward: Optimal designs for correlated data, multiple objectives, and multiple covariates
协作研究:重大飞跃:相关数据、多目标和多协变量的优化设计
  • 批准号:
    1406760
  • 财政年份:
    2014
  • 资助金额:
    $ 6万
  • 项目类别:
    Continuing Grant
Design and Analysis of Experiments
实验设计与分析
  • 批准号:
    1217801
  • 财政年份:
    2012
  • 资助金额:
    $ 6万
  • 项目类别:
    Standard Grant
Dimension Reduction, Model Selection and Classification in Functional Data Analysis.
函数数据分析中的降维、模型选择和分类。
  • 批准号:
    1105634
  • 财政年份:
    2011
  • 资助金额:
    $ 6万
  • 项目类别:
    Standard Grant
Optimal Design for Non-Linear Models, With an Emphasis on Categorical Data
非线性模型的优化设计,重点是分类数据
  • 批准号:
    1007507
  • 财政年份:
    2010
  • 资助金额:
    $ 6万
  • 项目类别:
    Continuing Grant
Collaborative Research: Optimal Design of Experiments for Categorical Data
协作研究:分类数据实验的优化设计
  • 批准号:
    0706917
  • 财政年份:
    2007
  • 资助金额:
    $ 6万
  • 项目类别:
    Continuing Grant
Mathematical Sciences: Design of Experiments: Improving Practicability of Some Useful Concepts
数学科学:实验设计:提高一些有用概念的实用性
  • 批准号:
    9504882
  • 财政年份:
    1995
  • 资助金额:
    $ 6万
  • 项目类别:
    Standard Grant

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
Cell Research
  • 批准号:
    31224802
  • 批准年份:
    2012
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research
  • 批准号:
    31024804
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research (细胞研究)
  • 批准号:
    30824808
  • 批准年份:
    2008
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
  • 批准号:
    10774081
  • 批准年份:
    2007
  • 资助金额:
    45.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: Spintronics Enabled Stochastic Spiking Neural Networks with Temporal Information Encoding
合作研究:自旋电子学支持具有时间信息编码的随机尖峰神经网络
  • 批准号:
    2333881
  • 财政年份:
    2024
  • 资助金额:
    $ 6万
  • 项目类别:
    Standard Grant
Collaborative Research: Spintronics Enabled Stochastic Spiking Neural Networks with Temporal Information Encoding
合作研究:自旋电子学支持具有时间信息编码的随机尖峰神经网络
  • 批准号:
    2333882
  • 财政年份:
    2024
  • 资助金额:
    $ 6万
  • 项目类别:
    Standard Grant
Collaborative Research: Road Information Discovery through Privacy-Preserved Collaborative Estimation in Connected Vehicles
协作研究:通过联网车辆中保护隐私的协作估计来发现道路信息
  • 批准号:
    2422579
  • 财政年份:
    2024
  • 资助金额:
    $ 6万
  • 项目类别:
    Standard Grant
Collaborative Research: Frameworks: Automated Quality Assurance and Quality Control for the StraboSpot Geologic Information System and Observational Data
合作研究:框架:StraboSpot 地质信息系统和观测数据的自动化质量保证和质量控制
  • 批准号:
    2311822
  • 财政年份:
    2023
  • 资助金额:
    $ 6万
  • 项目类别:
    Standard Grant
Collaborative Research: SaTC: TTP: Medium: iDRAMA.cloud: A Platform for Measuring and Understanding Information Manipulation
协作研究:SaTC:TTP:中:iDRAMA.cloud:测量和理解信息操纵的平台
  • 批准号:
    2247867
  • 财政年份:
    2023
  • 资助金额:
    $ 6万
  • 项目类别:
    Continuing Grant
Collaborative Research: HNDS-R: Polarization, Information Integrity, and Diffusion
合作研究:HNDS-R:极化、信息完整性和扩散
  • 批准号:
    2242072
  • 财政年份:
    2023
  • 资助金额:
    $ 6万
  • 项目类别:
    Standard Grant
Collaborative Research: Frameworks: Automated Quality Assurance and Quality Control for the StraboSpot Geologic Information System and Observational Data
合作研究:框架:StraboSpot 地质信息系统和观测数据的自动化质量保证和质量控制
  • 批准号:
    2311821
  • 财政年份:
    2023
  • 资助金额:
    $ 6万
  • 项目类别:
    Standard Grant
Collaborative Research: SaTC: CORE: Medium: Information Integrity: A User-centric Intervention
协作研究:SaTC:核心:媒介:信息完整性:以用户为中心的干预
  • 批准号:
    2323795
  • 财政年份:
    2023
  • 资助金额:
    $ 6万
  • 项目类别:
    Standard Grant
Collaborative Research: Visual Information about surface curvature from patterns of image shading and contours
合作研究:从图像阴影和轮廓图案中获取有关表面曲率的视觉信息
  • 批准号:
    2238180
  • 财政年份:
    2023
  • 资助金额:
    $ 6万
  • 项目类别:
    Standard Grant
Collaborative Research: Visual Information about surface curvature from patterns of image shading and contours
合作研究:从图像阴影和轮廓图案中获取有关表面曲率的视觉信息
  • 批准号:
    2238179
  • 财政年份:
    2023
  • 资助金额:
    $ 6万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了