Collaborative Research: Information-Based Subdata Selection Inspired by Optimal Design of Experiments
协作研究:受实验优化设计启发的基于信息的子数据选择
基本信息
- 批准号:1811291
- 负责人:
- 金额:$ 6万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-07-15 至 2022-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Extraordinary amounts of data are collected in many branches of science, in industry, and in government. The massive amounts of data provide incredible opportunities for making knowledge-based decisions and for advancing complicated research problems through data-driven discoveries. To capitalize on these opportunities, it is critical to develop methodology that facilitates the extraction of useful information from massive data in a computationally efficient way. Even the simplest analyses of the data can be computationally intensive or may no longer be feasible for big data. It is however often the case that valid conclusions can be drawn by considering only some of the data, referred to as subdata. This project develops optimal strategies for selecting subdata that retain, as much as possible, relevant information that was available in the massive data set. The methodology helps to identify the most informative data points, after which an analysis can proceed based on the selected subdata only. This facilitates data-driven decisions, scientific discoveries, and technological breakthroughs with computing resources that are readily available. Existing investigations for extracting information from big data with common computing power have focused on random subsampling-based approaches, which have as limitation that the amount of information extracted is only scalable to the subdata size, not the full data size. This project develops and expands the Information-Based Optimal Subdata Selection (IBOSS) method proposed by the PIs in the following directions: 1) It combines IBOSS with sparse variable selection methods in linear regression; 2) it develops subdata selection methods for generalized linear models; 3) it constructs computationally efficient algorithms for selecting the most informative subdata; and 4) it develops user-friendly software that supports the methodology. The research is a significant addition to the field of big data science. It advances a new method for dealing with big data and has the potential to create novel research opportunities in statistical science and other quantitative fields. The results are valuable even when supercomputers are available, because cutting edge high performance computing facilities will always trail the exponential growth of data volume.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
在科学的许多分支、工业和政府中收集了大量的数据。大量的数据为做出基于知识的决策和通过数据驱动的发现推进复杂的研究问题提供了令人难以置信的机会。为了利用这些机会,关键是要开发一种方法,以便于以计算效率高的方式从海量数据中提取有用的信息。即使是最简单的数据分析也可能是计算密集型的,或者可能不再适用于大数据。然而,通常情况下,只考虑部分数据(称为子数据)就可以得出有效的结论。该项目开发了选择子数据的最佳策略,这些子数据尽可能多地保留了海量数据集中可用的相关信息。该方法有助于确定信息量最大的数据点,然后仅根据选定的子数据进行分析。这有助于数据驱动的决策,科学发现和技术突破,以及随时可用的计算资源。现有的从具有普通计算能力的大数据中提取信息的研究集中在基于随机子采样的方法上,其局限性在于提取的信息量仅可扩展到子数据大小,而不是完整的数据大小。本计画发展及扩充由PI所提出的基于信息的最佳子数据选择(Information-Based Optimal Subdata Selection,IBOSS)方法,其方向如下:1)将IBOSS与线性回归中的稀疏变量选择方法结合; 2)发展广义线性模型的子数据选择方法; 3)建构计算效率高的算法,以选择最具信息量的子数据;以及4)它开发了支持该方法的用户友好软件。 这项研究是对大数据科学领域的重要补充。它提出了一种处理大数据的新方法,并有可能在统计科学和其他定量领域创造新的研究机会。即使在超级计算机可用的情况下,结果也是有价值的,因为尖端的高性能计算设施总是会跟踪数据量的指数增长。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Information-based optimal subdata selection for big data logistic regression
- DOI:10.1016/j.jspi.2020.03.004
- 发表时间:2020-12
- 期刊:
- 影响因子:0.9
- 作者:Q. Cheng;Haiying Wang;Min Yang
- 通讯作者:Q. Cheng;Haiying Wang;Min Yang
Optimal design under complete class with ancillary functions
全类下优化设计,附带辅助功能
- DOI:10.1002/cjs.11596
- 发表时间:2021
- 期刊:
- 影响因子:0
- 作者:Hua, Y. and
- 通讯作者:Hua, Y. and
Support point of locally optimal designs for multinomial logistic regression models
多项逻辑回归模型局部最优设计的支撑点
- DOI:10.1016/j.jspi.2020.03.006
- 发表时间:2020
- 期刊:
- 影响因子:0.9
- 作者:Hao, Shuai;Yang, Min
- 通讯作者:Yang, Min
On multiple-objective optimal designs
- DOI:10.1016/j.jspi.2018.09.007
- 发表时间:2019-05-01
- 期刊:
- 影响因子:0.9
- 作者:Cheng, Qianshun;Yang, Min
- 通讯作者:Yang, Min
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Min Yang其他文献
Synthesis and Catalytic Activity of Composite Materials TiO2/Ti-Al-MCM-41 by Chemical Vapor Deposition (CVD)
化学气相沉积 (CVD) 合成复合材料 TiO2/Ti-Al-MCM-41 及其催化活性
- DOI:
10.4028/www.scientific.net/amr.97-101.1749 - 发表时间:
2010 - 期刊:
- 影响因子:0
- 作者:
H. Guan;Xiao Yang;Sheng;Min Yang - 通讯作者:
Min Yang
Hybrid malware detection approach with feedback-directed machine learning
具有反馈导向机器学习的混合恶意软件检测方法
- DOI:
10.1007/s11432-018-9615-8 - 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Zhetao Li;Wenli Li;Fuyuan Lin;Yi Sun;Min Yang;Y. Zhang;Zhibo Wang - 通讯作者:
Zhibo Wang
Near-Infrared Spectroscopic Study of Chlorite Minerals
绿泥石矿物的近红外光谱研究
- DOI:
10.1155/2018/6958260 - 发表时间:
2018-02 - 期刊:
- 影响因子:2
- 作者:
Min Yang;Meifang Ye - 通讯作者:
Meifang Ye
Multi-Turn Video Question Generation via Reinforced Multi-Choice Attention Network
通过强化多选注意网络生成多轮视频问题
- DOI:
10.1109/tcsvt.2020.3014775 - 发表时间:
2021-05 - 期刊:
- 影响因子:8.4
- 作者:
Zhaoyu Guo;Zhou Zhao;Weike Jin;Zhicheng Wei;Min Yang;Nannan Wang;Nicholas Jing Yuan - 通讯作者:
Nicholas Jing Yuan
Slowing Down the Aging of Learning-based Malware Detectors with API Knowledge
利用 API 知识减缓基于学习的恶意软件检测器的老化
- DOI:
10.1109/tdsc.2022.3144697 - 发表时间:
2022 - 期刊:
- 影响因子:7.3
- 作者:
Xiaohan Zhang;Mi Zhang;Yuan Zhang;Ming Zhong;Xin Zhang;Yinzhi Cao;Min Yang - 通讯作者:
Min Yang
Min Yang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Min Yang', 18)}}的其他基金
Collaborative Research: Design-Based Optimal Subdata Selection Using Mixture-of-Experts Models to Account for Big Data Heterogeneity
协作研究:基于设计的最佳子数据选择,使用专家混合模型来解释大数据异构性
- 批准号:
2210546 - 财政年份:2022
- 资助金额:
$ 6万 - 项目类别:
Standard Grant
Collaborative research: A major leap forward: Optimal designs for correlated data, multiple objectives, and multiple covariates
协作研究:重大飞跃:相关数据、多目标和多协变量的优化设计
- 批准号:
1407518 - 财政年份:2014
- 资助金额:
$ 6万 - 项目类别:
Continuing Grant
Synthesis of glycosyl-novobiocins: probes of Hsp90 C-terminal affinity binding and novel anti-cancer drugs
糖基新生霉素的合成:Hsp90 C 端亲和结合探针和新型抗癌药物
- 批准号:
EP/K023071/1 - 财政年份:2013
- 资助金额:
$ 6万 - 项目类别:
Research Grant
CAREER: Optimal Design of Experiments for Generalized Linear Models
职业:广义线性模型实验的优化设计
- 批准号:
1322797 - 财政年份:2012
- 资助金额:
$ 6万 - 项目类别:
Continuing Grant
CAREER: Optimal Design of Experiments for Generalized Linear Models
职业:广义线性模型实验的优化设计
- 批准号:
0748409 - 财政年份:2008
- 资助金额:
$ 6万 - 项目类别:
Continuing Grant
Collaborative Research: Optimal Design of Experiments for Categorical Data
协作研究:分类数据实验的优化设计
- 批准号:
0707013 - 财政年份:2007
- 资助金额:
$ 6万 - 项目类别:
Continuing Grant
Crossover Designs for Comparing Test Treatments with a Control Treatment: Optimality, Efficiency, and Robustness
用于比较测试处理与控制处理的交叉设计:最优性、效率和稳健性
- 批准号:
0600943 - 财政年份:2005
- 资助金额:
$ 6万 - 项目类别:
Standard Grant
Crossover Designs for Comparing Test Treatments with a Control Treatment: Optimality, Efficiency, and Robustness
用于比较测试处理与控制处理的交叉设计:最优性、效率和稳健性
- 批准号:
0304661 - 财政年份:2003
- 资助金额:
$ 6万 - 项目类别:
Standard Grant
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
Cell Research
- 批准号:31224802
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research
- 批准号:31024804
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research (细胞研究)
- 批准号:30824808
- 批准年份:2008
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
- 批准号:10774081
- 批准年份:2007
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: Spintronics Enabled Stochastic Spiking Neural Networks with Temporal Information Encoding
合作研究:自旋电子学支持具有时间信息编码的随机尖峰神经网络
- 批准号:
2333881 - 财政年份:2024
- 资助金额:
$ 6万 - 项目类别:
Standard Grant
Collaborative Research: Spintronics Enabled Stochastic Spiking Neural Networks with Temporal Information Encoding
合作研究:自旋电子学支持具有时间信息编码的随机尖峰神经网络
- 批准号:
2333882 - 财政年份:2024
- 资助金额:
$ 6万 - 项目类别:
Standard Grant
Collaborative Research: Road Information Discovery through Privacy-Preserved Collaborative Estimation in Connected Vehicles
协作研究:通过联网车辆中保护隐私的协作估计来发现道路信息
- 批准号:
2422579 - 财政年份:2024
- 资助金额:
$ 6万 - 项目类别:
Standard Grant
Collaborative Research: Frameworks: Automated Quality Assurance and Quality Control for the StraboSpot Geologic Information System and Observational Data
合作研究:框架:StraboSpot 地质信息系统和观测数据的自动化质量保证和质量控制
- 批准号:
2311822 - 财政年份:2023
- 资助金额:
$ 6万 - 项目类别:
Standard Grant
Collaborative Research: SaTC: TTP: Medium: iDRAMA.cloud: A Platform for Measuring and Understanding Information Manipulation
协作研究:SaTC:TTP:中:iDRAMA.cloud:测量和理解信息操纵的平台
- 批准号:
2247867 - 财政年份:2023
- 资助金额:
$ 6万 - 项目类别:
Continuing Grant
Collaborative Research: Frameworks: Automated Quality Assurance and Quality Control for the StraboSpot Geologic Information System and Observational Data
合作研究:框架:StraboSpot 地质信息系统和观测数据的自动化质量保证和质量控制
- 批准号:
2311821 - 财政年份:2023
- 资助金额:
$ 6万 - 项目类别:
Standard Grant
Collaborative Research: HNDS-R: Polarization, Information Integrity, and Diffusion
合作研究:HNDS-R:极化、信息完整性和扩散
- 批准号:
2242072 - 财政年份:2023
- 资助金额:
$ 6万 - 项目类别:
Standard Grant
Collaborative Research: SaTC: CORE: Medium: Information Integrity: A User-centric Intervention
协作研究:SaTC:核心:媒介:信息完整性:以用户为中心的干预
- 批准号:
2323795 - 财政年份:2023
- 资助金额:
$ 6万 - 项目类别:
Standard Grant
Collaborative Research: Visual Information about surface curvature from patterns of image shading and contours
合作研究:从图像阴影和轮廓图案中获取有关表面曲率的视觉信息
- 批准号:
2238180 - 财政年份:2023
- 资助金额:
$ 6万 - 项目类别:
Standard Grant
Collaborative Research: Visual Information about surface curvature from patterns of image shading and contours
合作研究:从图像阴影和轮廓图案中获取有关表面曲率的视觉信息
- 批准号:
2238179 - 财政年份:2023
- 资助金额:
$ 6万 - 项目类别:
Standard Grant