Collaborative Research: Design-Based Optimal Subdata Selection Using Mixture-of-Experts Models to Account for Big Data Heterogeneity
协作研究:基于设计的最佳子数据选择,使用专家混合模型来解释大数据异构性
基本信息
- 批准号:2210546
- 负责人:
- 金额:$ 15万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-08-15 至 2025-07-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
With technological advances, it has become easy to collect massive amounts of data for most areas of research. But with the size of datasets measured in terabytes or even petabytes, analyzing such datasets can become an expensive computational challenge and may be impossible on a typical desktop or laptop computer. However, for making impactful discoveries, it may be unnecessary to analyze an entire dataset. Consequently, there is great interest in developing and studying methods for selecting a subset from a massive dataset and for drawing conclusions based on the much smaller selected dataset. Such methods are known as subdata selection or subsampling methods. One obvious subsampling method consists of randomly selecting data from the entire dataset. While this is often the simplest and fastest option, it has been established that better options are often available. In this project, the principal investigators (PIs) aim to develop and study a rigorous framework and new methods for optimal subdata selection by using models that account for heterogeneity in the data, which is often present in large datasets. Research findings will be incorporated in topical courses to train graduate students in large-scale data analysis. The work will also be disseminated via the PIs’ collaborations in public health, biomedical science, and business.Rather than assuming a multiple regression model, the PIs plan to develop and study subdata selection methods based on mixture-of-experts (ME) models, which can account for heterogeneity in the data. The PIs will initially develop and study subdata selection methods for a subclass of the ME models, known as clusterwise linear regression models, for which the gate functions are constant. This will be followed by studying logistic-normal mixture models, in which the gate functions depend on the regression variables. For both cases, the investigators plan to develop information-based optimal subdata selection methods, first for continuous response variables and then for binary response variables, study their statistical properties, and develop efficient algorithms for the methods that will be made available in an R package.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
随着技术的进步,为大多数研究领域收集大量数据变得很容易。但是,由于数据集的大小以TB甚至PB为单位,分析这些数据集可能成为一项昂贵的计算挑战,并且在典型的台式机或笔记本电脑上可能是不可能的。然而,为了做出有影响力的发现,可能没有必要分析整个数据集。因此,人们对开发和研究从大量数据集中选择子集的方法以及基于小得多的所选数据集得出结论的方法非常感兴趣。这种方法被称为子数据选择或子采样方法。一种明显的二次抽样方法是从整个数据集中随机选择数据。虽然这通常是最简单和最快的选择,但已确定通常有更好的选择。在这个项目中,主要研究者(PI)的目标是开发和研究一个严格的框架和新方法,通过使用模型来解释数据中的异质性,这通常存在于大型数据集中。研究结果将纳入专题课程,以培训研究生进行大规模数据分析。这项工作也将通过公共卫生、生物医学科学和商业领域的PI合作进行传播。PI计划开发和研究基于专家混合(ME)模型的子数据选择方法,而不是假设多元回归模型,该模型可以解释数据的异质性。PI将首先开发和研究ME模型子类的子数据选择方法,称为聚类线性回归模型,其门函数是恒定的。接下来将研究逻辑正态混合模型,其中门函数取决于回归变量。对于这两种情况,研究人员计划开发基于信息的最优子数据选择方法,首先针对连续响应变量,然后针对二元响应变量,研究它们的统计特性,该奖项反映了NSF的法定使命,并被认为值得通过使用基金会的智力价值和更广泛的影响审查标准。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Min Yang其他文献
Synthesis and Catalytic Activity of Composite Materials TiO2/Ti-Al-MCM-41 by Chemical Vapor Deposition (CVD)
化学气相沉积 (CVD) 合成复合材料 TiO2/Ti-Al-MCM-41 及其催化活性
- DOI:
10.4028/www.scientific.net/amr.97-101.1749 - 发表时间:
2010 - 期刊:
- 影响因子:0
- 作者:
H. Guan;Xiao Yang;Sheng;Min Yang - 通讯作者:
Min Yang
Hybrid malware detection approach with feedback-directed machine learning
具有反馈导向机器学习的混合恶意软件检测方法
- DOI:
10.1007/s11432-018-9615-8 - 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Zhetao Li;Wenli Li;Fuyuan Lin;Yi Sun;Min Yang;Y. Zhang;Zhibo Wang - 通讯作者:
Zhibo Wang
Near-Infrared Spectroscopic Study of Chlorite Minerals
绿泥石矿物的近红外光谱研究
- DOI:
10.1155/2018/6958260 - 发表时间:
2018-02 - 期刊:
- 影响因子:2
- 作者:
Min Yang;Meifang Ye - 通讯作者:
Meifang Ye
Multi-Turn Video Question Generation via Reinforced Multi-Choice Attention Network
通过强化多选注意网络生成多轮视频问题
- DOI:
10.1109/tcsvt.2020.3014775 - 发表时间:
2021-05 - 期刊:
- 影响因子:8.4
- 作者:
Zhaoyu Guo;Zhou Zhao;Weike Jin;Zhicheng Wei;Min Yang;Nannan Wang;Nicholas Jing Yuan - 通讯作者:
Nicholas Jing Yuan
Slowing Down the Aging of Learning-based Malware Detectors with API Knowledge
利用 API 知识减缓基于学习的恶意软件检测器的老化
- DOI:
10.1109/tdsc.2022.3144697 - 发表时间:
2022 - 期刊:
- 影响因子:7.3
- 作者:
Xiaohan Zhang;Mi Zhang;Yuan Zhang;Ming Zhong;Xin Zhang;Yinzhi Cao;Min Yang - 通讯作者:
Min Yang
Min Yang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Min Yang', 18)}}的其他基金
Collaborative Research: Information-Based Subdata Selection Inspired by Optimal Design of Experiments
协作研究:受实验优化设计启发的基于信息的子数据选择
- 批准号:
1811291 - 财政年份:2018
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative research: A major leap forward: Optimal designs for correlated data, multiple objectives, and multiple covariates
协作研究:重大飞跃:相关数据、多目标和多协变量的优化设计
- 批准号:
1407518 - 财政年份:2014
- 资助金额:
$ 15万 - 项目类别:
Continuing Grant
Synthesis of glycosyl-novobiocins: probes of Hsp90 C-terminal affinity binding and novel anti-cancer drugs
糖基新生霉素的合成:Hsp90 C 端亲和结合探针和新型抗癌药物
- 批准号:
EP/K023071/1 - 财政年份:2013
- 资助金额:
$ 15万 - 项目类别:
Research Grant
CAREER: Optimal Design of Experiments for Generalized Linear Models
职业:广义线性模型实验的优化设计
- 批准号:
1322797 - 财政年份:2012
- 资助金额:
$ 15万 - 项目类别:
Continuing Grant
CAREER: Optimal Design of Experiments for Generalized Linear Models
职业:广义线性模型实验的优化设计
- 批准号:
0748409 - 财政年份:2008
- 资助金额:
$ 15万 - 项目类别:
Continuing Grant
Collaborative Research: Optimal Design of Experiments for Categorical Data
协作研究:分类数据实验的优化设计
- 批准号:
0707013 - 财政年份:2007
- 资助金额:
$ 15万 - 项目类别:
Continuing Grant
Crossover Designs for Comparing Test Treatments with a Control Treatment: Optimality, Efficiency, and Robustness
用于比较测试处理与控制处理的交叉设计:最优性、效率和稳健性
- 批准号:
0600943 - 财政年份:2005
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Crossover Designs for Comparing Test Treatments with a Control Treatment: Optimality, Efficiency, and Robustness
用于比较测试处理与控制处理的交叉设计:最优性、效率和稳健性
- 批准号:
0304661 - 财政年份:2003
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
Cell Research
- 批准号:31224802
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research
- 批准号:31024804
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research (细胞研究)
- 批准号:30824808
- 批准年份:2008
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
- 批准号:10774081
- 批准年份:2007
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: Beyond the Single-Atom Paradigm: A Priori Design of Dual-Atom Alloy Active Sites for Efficient and Selective Chemical Conversions
合作研究:超越单原子范式:双原子合金活性位点的先验设计,用于高效和选择性化学转化
- 批准号:
2334970 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research: Concurrent Design Integration of Products and Remanufacturing Processes for Sustainability and Life Cycle Resilience
协作研究:产品和再制造流程的并行设计集成,以实现可持续性和生命周期弹性
- 批准号:
2348641 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research: DMREF: Closed-Loop Design of Polymers with Adaptive Networks for Extreme Mechanics
合作研究:DMREF:采用自适应网络进行极限力学的聚合物闭环设计
- 批准号:
2413579 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research: Design and synthesis of hybrid anode materials made of chemically bonded carbon nanotube to copper: a concerted experiment/theory approach
合作研究:设计和合成由化学键合碳纳米管和铜制成的混合阳极材料:协调一致的实验/理论方法
- 批准号:
2334039 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Continuing Grant
Collaborative Research: Design: Strengthening Inclusion by Change in Building Equity, Diversity and Understanding (SICBEDU) in Integrative Biology
合作研究:设计:通过改变综合生物学中的公平、多样性和理解(SICBEDU)来加强包容性
- 批准号:
2335235 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research: Merging Human Creativity with Computational Intelligence for the Design of Next Generation Responsive Architecture
协作研究:将人类创造力与计算智能相结合,设计下一代响应式架构
- 批准号:
2329759 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research: Meshed GNSS-Acoustic Array Design for Lower-Cost Dense Observation Fields
合作研究:用于低成本密集观测场的网状 GNSS 声学阵列设计
- 批准号:
2321297 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Continuing Grant
Collaborative Research: SaTC: CORE: Medium: Differentially Private SQL with flexible privacy modeling, machine-checked system design, and accuracy optimization
协作研究:SaTC:核心:中:具有灵活隐私建模、机器检查系统设计和准确性优化的差异化私有 SQL
- 批准号:
2317232 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Continuing Grant
Collaborative Research: Design and synthesis of hybrid anode materials made of chemically bonded carbon nanotube to copper: a concerted experiment/theory approach
合作研究:设计和合成由化学键合碳纳米管和铜制成的混合阳极材料:协调一致的实验/理论方法
- 批准号:
2334040 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Continuing Grant
Collaborative Research: DMREF: AI-enabled Automated design of ultrastrong and ultraelastic metallic alloys
合作研究:DMREF:基于人工智能的超强和超弹性金属合金的自动化设计
- 批准号:
2411603 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant