Data-driven statistical dynamical modeling: Shortage of training data and high- dimensionality

数据驱动的统计动态建模:训练数据短缺和高维

基本信息

  • 批准号:
    2207328
  • 负责人:
  • 金额:
    $ 30万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-08-01 至 2025-07-31
  • 项目状态:
    未结题

项目摘要

Today, machine learning is a prominent scientific computing tool with many practical applications. Notable successes are the classification problems of identifying pictures and the Artificial Intelligence (AI) Go-player that beats the best human player in the world. While these successes are important milestones, there are emerging needs to replicate these successes in the statistical modeling of time-evolving complex systems, with examples ranging from predicting climate to nanomaterials under external disturbances. The goal of this project is to develop the next-generation mathematical and algorithmic tools to overcome two important issues in extending machine learning to such problems, namely a shortage of informative data for effective learning and the expensive computational costs. This objective will be addressed by a theoretical and algorithmic development in computational mathematics, leveraging the fundamental knowledge from the basic sciences, including geometry, dynamical systems, data sciences, and statistics. This project will contribute to the NSF mission of advancing STEM through the training of graduate students and curricular development through the design of courses in the mathematical theory of machine learning. In particular, this project will support one graduate student. The goals of this project are to overcome the shortage of training data and exploit the manifold assumption to avoid the curse of dimension in the statistical modeling of dynamical systems. Beyond uncertainty quantification (UQ) applications, a statistical closure model will be developed to enhance the training of ML-based prediction models when the observed time series is too short for accurate estimation. Specifically, the proposed projects are: 1) To develop a systematic reduced-order statistical closure model. This project extends the recently developed ML-based non-Markovian closure framework for accurate predictions of statistical responses subjected to unseen external forcings, which is important for UQ. 2) To develop a dimensionality reduction technique that respects the geometry of the data under a manifold assumption on the dynamical variables. The approach includes an accurate Radial Basis Function approximation to the Bochner Laplacian from the embedded data. Subsequently, the estimated eigen-vector-fields will be used as a frame to represent the vector fields corresponding to the unresolved dynamics. This model reduction framework provides a computationally cheaper alternative to deep learning. 3) To study the theoretical convergence property of a recently developed algorithm, Bayesian Machine Learning (BML), which uses solutions of a statistically consistent model to enhance the training of the neural network (NN) model in learning non-Markovian dynamics with a short observational time series. This study is motivated by a recent empirical finding that the NN model obtained from the BML training algorithm improves the accuracy of the El Niño prediction by at least two months compared to the same NN architecture trained using the standard stochastic gradient descent algorithm. The ultimate goal of this study is to evaluate and develop a theoretical understanding of the effectiveness of the statistical closure model from Task 1) to enhance Bayesian Machine Learning.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
如今,机器学习是一种重要的科学计算工具,具有许多实际应用。值得注意的成功是识别图片的分类问题以及击败世界上最好的人类棋手的人工智能(AI)围棋棋手。虽然这些成功是重要的里程碑,但越来越需要在时间演化的复杂系统的统计模型中复制这些成功,例如预测气候到外部干扰下的纳米材料。该项目的目标是开发下一代数学和算法工具,以克服将机器学习扩展到此类问题的两个重要问题,即缺乏有效学习的信息数据和昂贵的计算成本。这一目标将通过计算数学的理论和算法发展来实现,利用基础科学的基础知识,包括几何、动力系统、数据科学和统计学。该项目将有助于实现 NSF 的使命,即通过研究生培训和机器学习数学理论课程设计来推进 STEM 发展和课程开发。该项目将特别支持一名研究生。该项目的目标是克服训练数据的短缺,并利用流形假设来避免动力系统统计建模中的维数灾难。除了不确定性量化(UQ)应用之外,当观察到的时间序列太短而无法准确估计时,还将开发统计闭合模型来增强基于机器学习的预测模型的训练。具体来说,拟议的项目是: 1)开发系统的降阶统计闭合模型。该项目扩展了最近开发的基于机器学习的非马尔可夫闭包框架,用于准确预测受到看不见的外部强迫影响的统计响应,这对昆士兰大学非常重要。 2)开发一种降维技术,在动态变量的流形假设下尊重数据的几何形状。该方法包括根据嵌入数据对博赫纳拉普拉斯算子进行精确的径向基函数逼近。随后,估计的特征向量场将用作框架来表示与未解析的动力学相对应的向量场。这种模型简化框架提供了一种计算成本更低的深度学习替代方案。 3)研究最近开发的算法贝叶斯机器学习(BML)的理论收敛特性,该算法使用统计一致模型的解决方案来增强神经网络(NN)模型的训练,以学习短观测时间序列的非马尔可夫动力学。这项研究的动机是最近的一项实证发现,即与使用标准随机梯度下降算法训练的相同神经网络架构相比,从 BML 训练算法获得的神经网络模型将厄尔尼诺预测的准确性提高了至少两个月。本研究的最终目标是评估和发展对任务 1) 中统计闭合模型有效性的理论理解,以增强贝叶斯机器学习。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
A data-driven statistical-stochastic surrogate modeling strategy for complex nonlinear non-stationary dynamics
复杂非线性非平稳动力学的数据驱动统计随机代理建模策略
  • DOI:
    10.1016/j.jcp.2023.112085
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    4.1
  • 作者:
    Qi, Di;Harlim, John
  • 通讯作者:
    Harlim, John
Machine learning-based statistical closure models for turbulent dynamical systems
基于机器学习的湍流动力系统统计闭合模型
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

John Harlim其他文献

John Harlim的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('John Harlim', 18)}}的其他基金

FRG: Collaborative Research: Non-Smooth Geometry, Spectral Theory, and Data: Learning and Representing Projections of Complex Systems
FRG:协作研究:非光滑几何、谱理论和数据:学习和表示复杂系统的投影
  • 批准号:
    1854299
  • 财政年份:
    2019
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
Data-driven Modeling of Equilibrium and Non-equilibrium Statistics
均衡和非均衡统计的数据驱动建模
  • 批准号:
    1619661
  • 财政年份:
    2016
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
Practical Filtering Methods with Model Errors
具有模型误差的实用过滤方法
  • 批准号:
    1317919
  • 财政年份:
    2013
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant

相似国自然基金

Data-driven Recommendation System Construction of an Online Medical Platform Based on the Fusion of Information
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    外国青年学者研究基金项目
基于Cache的远程计时攻击研究
  • 批准号:
    60772082
  • 批准年份:
    2007
  • 资助金额:
    28.0 万元
  • 项目类别:
    面上项目

相似海外基金

Revealing mechanisms of specificity and adaptability in molecular information processing through data-driven models
通过数据驱动模型揭示分子信息处理的特异性和适应性机制
  • 批准号:
    10715575
  • 财政年份:
    2023
  • 资助金额:
    $ 30万
  • 项目类别:
Identifying genetically driven gene dysregulation in Alzheimer's disease and related dementias using statistical data integration
使用统计数据整合识别阿尔茨海默病和相关痴呆症中遗传驱动的基因失调
  • 批准号:
    10659349
  • 财政年份:
    2023
  • 资助金额:
    $ 30万
  • 项目类别:
Data Management and Analysis Core
数据管理与分析核心
  • 批准号:
    10333814
  • 财政年份:
    2022
  • 资助金额:
    $ 30万
  • 项目类别:
Unsupervised Statistical Methods for Data-driven Analyses in Spatially Resolved Transcriptomics Data
空间分辨转录组数据中数据驱动分析的无监督统计方法
  • 批准号:
    10556351
  • 财政年份:
    2022
  • 资助金额:
    $ 30万
  • 项目类别:
Resolving single-cell analysis challenges via data-driven decision frameworks and novel statistical methods
通过数据驱动的决策框架和新颖的统计方法解决单细胞分析挑战
  • 批准号:
    10707308
  • 财政年份:
    2022
  • 资助金额:
    $ 30万
  • 项目类别:
Data-driven optimization for DBS programming in temporal lobe epilepsy
颞叶癫痫 DBS 编程的数据驱动优化
  • 批准号:
    10574839
  • 财政年份:
    2022
  • 资助金额:
    $ 30万
  • 项目类别:
Data-Driven Approaches to Identify Biomarkers for Guiding Coronary Artery Bifurcation Lesion Interventions from Patient-Specific Hemodynamic Models
从患者特异性血流动力学模型中识别生物标志物的数据驱动方法,用于指导冠状动脉分叉病变干预
  • 批准号:
    10373696
  • 财政年份:
    2022
  • 资助金额:
    $ 30万
  • 项目类别:
Data Management and Analysis Core
数据管理与分析核心
  • 批准号:
    10622448
  • 财政年份:
    2022
  • 资助金额:
    $ 30万
  • 项目类别:
A novel data-driven approach for personalizing smoking cessation pharmacotherapy
一种新的数据驱动的个性化戒烟药物治疗方法
  • 批准号:
    10437438
  • 财政年份:
    2022
  • 资助金额:
    $ 30万
  • 项目类别:
Data analysis tools for leveraging massive public data to improve hypothesis-driven research
数据分析工具,利用大量公共数据来改进假设驱动的研究
  • 批准号:
    10598130
  • 财政年份:
    2022
  • 资助金额:
    $ 30万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了