Parallel Ensemble Learning and Feature Interaction Discovery: High Volume Dynamic Data

并行集成学习和特征交互发现:大量动态数据

基本信息

  • 批准号:
    1953191
  • 负责人:
  • 金额:
    $ 45.2万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-08-15 至 2023-07-31
  • 项目状态:
    已结题

项目摘要

In the digital age, the advancement of technology has enabled data collection at an unprecedented pace including the collection of a variety of dynamic data over time. Such dynamic data potentially holds the key to unlock many mysteries in science, such as how genes interact with each other in the developments of Drosophila and Human alike. However, dynamic data is notoriously challenging to analyze due to its changing nature as well as its massive data size. The PI plans to enhance the modeling toolbox for dynamic data by designing scalable parallel algorithms that aim at both high prediction accuracy and high interpretability through decision-tree based methods. Their applications range across many fields including computational biology and precision medicine. During the course of the proposed research, graduate students will receive training in domain-driven data science and open-source software development. Further dissemination of the proposed research will be through an upcoming book, undergraduate- and graduate-level courses, and presentations at workshops and conferences.The high-volume dynamic data poses challenges to the model training process because the underlying data distribution is varying with time. Algorithms or models have to adapt to the changing dynamic as well as their interpretations. Among statistics and machine learning methods, decision-tree based ensembles are especially favorable for dealing with a large volume of dynamic data because tree ensembles can capture flexible non-linear relationships in the data and are easily interpretable for people to extract useful narratives and information. PI’s prior work, such as iterative Random Forests (iRF) and signed iterative Random Forests (siRF), identifies stable and high-order biomolecule interactions that explain its high predictive accuracy but it only focuses on cross-sectional data at a fixed time point. The proposed research will build on the iRF and siRF algorithms to develop enhanced Random Forest and iRF algorithms for modeling high-volume and dynamic data with interpretable high-order feature interactions. The PI will 1) develop a communication-efficient parallel RF training algorithm (pRF) that can efficiently take advantage of a large number of machines. 2) propose a novel method that discovers feature interactions in the dynamic data with the presence of concept drift: dynamic iterative Random Forests (diRF). 3) carry out a theoretical analysis of pRF and diRF algorithm under time-varying change-detection models where local stationarity conditions are satisfied.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
在数字时代,技术的进步使数据收集以前所未有的速度进行,包括随着时间的推移收集各种动态数据。这种动态数据潜在地掌握着解开科学中许多谜团的钥匙,例如在果蝇和人类的发育过程中,基因是如何相互作用的。然而,动态数据由于其不断变化的性质和海量数据大小而难以分析,这是出了名的。PI计划通过设计可扩展的并行算法来增强动态数据的建模工具箱,这些算法旨在通过基于决策树的方法实现高预测精度和高可解释性。它们的应用范围涉及许多领域,包括计算生物学和精确医学。在拟议的研究过程中,研究生将接受领域驱动的数据科学和开源软件开发方面的培训。将通过即将出版的一本书、本科生和研究生级别的课程以及在研讨会和会议上的演讲来进一步传播拟议的研究。大量的动态数据对模型培训过程构成了挑战,因为潜在的数据分布随时间而变化。算法或模型必须适应不断变化的动态及其解释。在统计学和机器学习方法中,基于决策树的集成方法特别适合处理大量的动态数据,因为树集成可以捕获数据中灵活的非线性关系,并且易于解释,便于人们提取有用的叙述和信息。PI以前的工作,如迭代随机森林(IRF)和符号迭代随机森林(SiRF),确定了稳定的和高阶生物分子相互作用,解释了其高预测精度,但它只关注固定时间点的横截面数据。建议的研究将建立在IRF和SiRF算法的基础上,以开发增强型随机森林和IRF算法,用于建模具有可解释的高阶特征交互的大容量和动态数据。PI将1)开发一种通信高效的并行RF训练算法(PRF),该算法可以有效地利用大量机器。2)提出了一种在存在概念漂移的动态数据中发现特征交互的新方法:动态迭代随机森林(DIRF)。3)在满足局部平稳性条件的时变变化检测模型下,对PRF和DRF算法进行了理论分析。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Stable Discovery of Interpretable Subgroups via Calibration in Causal Studies
  • DOI:
    10.1111/insr.12427
  • 发表时间:
    2020-12-22
  • 期刊:
  • 影响因子:
    2
  • 作者:
    Dwivedi, Raaz;Tan, Yan Shuo;Yu, Bin
  • 通讯作者:
    Yu, Bin
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Bin Yu其他文献

Images of China : An Empirical Study of Western Tourist Material
中国形象:西方旅游材料的实证研究
  • DOI:
  • 发表时间:
    2012
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Ying Sun;Bin Yu
  • 通讯作者:
    Bin Yu
Machine perfusion combined with antibiotics prevents donor‐derived infections caused by multidrug‐resistant bacteria
机器灌注联合抗生素预防多重耐药菌引起的供体源性感染
  • DOI:
    10.1111/ajt.17032
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    8.8
  • 作者:
    Han Liang;Peng Zhang;Bin Yu;Zhongzhong Liu;Li Pan;Xueyu He;Xiaoli Fan;Yanfeng Wang
  • 通讯作者:
    Yanfeng Wang
Nonparametric sparse hierarchical models describe V1 fMRI responses to natural images
非参数稀疏分层模型描述 V1 fMRI 对自然图像的响应
  • DOI:
  • 发表时间:
    2008
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Pradeep Ravikumar;Vincent Q. Vu;Bin Yu;Thomas Naselaris;Kendrick Norris Kay;J. Gallant
  • 通讯作者:
    J. Gallant
Scaling vortex breakdown mechanism based on viscous effect in shock cylindrical bubble interaction
激波圆柱气泡相互作用中基于粘性效应的尺度涡流破坏机制
  • DOI:
    10.1063/1.5051463
  • 发表时间:
    2018-12
  • 期刊:
  • 影响因子:
    4.6
  • 作者:
    Zi'ang Wang;Bin Yu;Hao Chen;Bin Zhang;Hong Liu
  • 通讯作者:
    Hong Liu
Readiness of as-built horizontal curved roads for LiDAR-based automated vehicles: A virtual simulation analysis
基于激光雷达的自动驾驶汽车的已建成水平弯曲道路的准备情况:虚拟仿真分析
  • DOI:
    10.1016/j.aap.2022.106762
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    5.9
  • 作者:
    Shuyi Wang;Yang Ma;Jinzhou Liu;Bin Yu;Feng Zhu
  • 通讯作者:
    Feng Zhu

Bin Yu的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Bin Yu', 18)}}的其他基金

Advancing Theory and Methodology for Tree-Based Algorithms in High Dimensions
推进高维树基算法的理论和方法
  • 批准号:
    2209975
  • 财政年份:
    2022
  • 资助金额:
    $ 45.2万
  • 项目类别:
    Standard Grant
Understanding Complexity and the Bias-Variance Tradeoff in High Dimensions: Theory and Data Evidence
理解高维度的复杂性和偏差-方差权衡:理论和数据证据
  • 批准号:
    2015341
  • 财政年份:
    2020
  • 资助金额:
    $ 45.2万
  • 项目类别:
    Standard Grant
Understand the functional mechanism of the DSP1 complex in the 3' end maturation of plant small nuclear RNAs
了解DSP1复合物在植物核小RNA 3端成熟中的功能机制
  • 批准号:
    1818082
  • 财政年份:
    2018
  • 资助金额:
    $ 45.2万
  • 项目类别:
    Standard Grant
BIGDATA: F: Scalable and Interpretable Machine Learning: Bridging Mechanistic and Data-Driven Modeling in the Biological Sciences
BIGDATA:F:可扩展和可解释的机器学习:桥接生物科学中的机械和数据驱动建模
  • 批准号:
    1741340
  • 财政年份:
    2017
  • 资助金额:
    $ 45.2万
  • 项目类别:
    Standard Grant
Canonical Linear Methods and Hierarchical Non-Linear Methods in High-Dimensional Statistics
高维统计中的规范线性方法和分层非线性方法
  • 批准号:
    1613002
  • 财政年份:
    2016
  • 资助金额:
    $ 45.2万
  • 项目类别:
    Continuing Grant
Smart Nanofabrication via Rational Assembly of Two-Dimensional Heterosystems
通过二维异质系统的合理组装实现智能纳米制造
  • 批准号:
    1434689
  • 财政年份:
    2014
  • 资助金额:
    $ 45.2万
  • 项目类别:
    Standard Grant
Collaborative Research: Leverage Subsampling for Regression and Dimension Reduction
协作研究:利用子采样进行回归和降维
  • 批准号:
    1228246
  • 财政年份:
    2012
  • 资助金额:
    $ 45.2万
  • 项目类别:
    Standard Grant
Direct Self-Assembly of Large Area, High Crystallinity 2D Graphene on Insulator: An Integratable Carbon Platform
绝缘体上大面积、高结晶度二维石墨烯的直接自组装:可集成的碳平台
  • 批准号:
    1162312
  • 财政年份:
    2012
  • 资助金额:
    $ 45.2万
  • 项目类别:
    Standard Grant
Understanding DAWDLE Function in miRNA and siRNA Biogenesis
了解 DAWDLE 在 miRNA 和 siRNA 生物发生中的功能
  • 批准号:
    1121193
  • 财政年份:
    2011
  • 资助金额:
    $ 45.2万
  • 项目类别:
    Continuing Grant
Ultra-Low-Power Complementary Logic with On-Chip Directly Assembled, Highly Adaptive 2-D Graphitic Platform
超低功耗互补逻辑,具有片上直接组装、高度自适应的 2D 图形平台
  • 批准号:
    1002228
  • 财政年份:
    2010
  • 资助金额:
    $ 45.2万
  • 项目类别:
    Standard Grant

相似海外基金

Development and spectral analysis of an ensemble machine learning model using quantum chemical descriptors
使用量子化学描述符的集成机器学习模型的开发和光谱分析
  • 批准号:
    23K04678
  • 财政年份:
    2023
  • 资助金额:
    $ 45.2万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Predicting maladaptive aversive learning via computational modeling of insular single cell ensemble activity patterns
通过岛叶单细胞整体活动模式的计算模型来预测适应不良的厌恶学习
  • 批准号:
    10575313
  • 财政年份:
    2023
  • 资助金额:
    $ 45.2万
  • 项目类别:
An ensemble deep learning model for tumor bud detection and risk stratification in colorectal carcinoma.
用于结直肠癌肿瘤芽检测和风险分层的集成深度学习模型。
  • 批准号:
    10564824
  • 财政年份:
    2023
  • 资助金额:
    $ 45.2万
  • 项目类别:
Improving flexibility and performance of the Acute Care Enhanced Surveillance (ACES) System for public health surveillance: an ensemble of state-of-the-art machine learning and rule-based natural language processing methods
提高用于公共卫生监测的急性护理增强监测 (ACES) 系统的灵活性和性能:最先进的机器学习和基于规则的自然语言处理方法的集合
  • 批准号:
    468864
  • 财政年份:
    2022
  • 资助金额:
    $ 45.2万
  • 项目类别:
    Operating Grants
Dynamic ensemble selection for data streams and multi-view learning
数据流和多视图学习的动态集成选择
  • 批准号:
    RGPIN-2021-04130
  • 财政年份:
    2022
  • 资助金额:
    $ 45.2万
  • 项目类别:
    Discovery Grants Program - Individual
Integrated Ensemble Learning with Embedded Vectors in Authorship Attribution
作者归属中使用嵌入式向量的集成集成学习
  • 批准号:
    22K12726
  • 财政年份:
    2022
  • 资助金额:
    $ 45.2万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Statistical machine learning tools for understanding neural ensemble representations and dynamics
用于理解神经集成表示和动态的统计机器学习工具
  • 批准号:
    10510107
  • 财政年份:
    2022
  • 资助金额:
    $ 45.2万
  • 项目类别:
Elucidating the role of Arc in determining cell ensemble dynamics underlying associative learning
阐明 Arc 在确定关联学习背后的细胞整体动力学中的作用
  • 批准号:
    22K15199
  • 财政年份:
    2022
  • 资助金额:
    $ 45.2万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Dopamine, Synaptic Plasticity and Striatal Ensemble Dynamics Underlying Motor Learning
运动学习背后的多巴胺、突触可塑性和纹状体整体动力学
  • 批准号:
    10621912
  • 财政年份:
    2021
  • 资助金额:
    $ 45.2万
  • 项目类别:
Dynamic ensemble selection for data streams and multi-view learning
数据流和多视图学习的动态集成选择
  • 批准号:
    DGECR-2021-00309
  • 财政年份:
    2021
  • 资助金额:
    $ 45.2万
  • 项目类别:
    Discovery Launch Supplement
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了