Adaptive Reproducible High-Dimensional Nonlinear Inference for Big Biological Data

生物大数据的自适应可再现高维非线性推理

基本信息

  • 批准号:
    9753295
  • 负责人:
  • 金额:
    $ 27.99万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-08-01 至 2022-04-30
  • 项目状态:
    已结题

项目摘要

Big data is now ubiquitous in every field of modern scientific research. Many contemporary applications, such as the recent national microbiome initiative (NMI), greatly demand highly flexible statistical machine learning methods that can produce both interpretable and reproducible results. Thus, it is of paramount importance to identify crucial causal factors that are responsible for the response from a large number of available covariates, which can be statistically formulated as the false discovery rate (FDR) control in general high-dimensional nonlinear models. Despite the enormous applications of shotgun metagenomic studies, most existing investigations concentrate on the study of bacterial organisms. However, viruses and virus-host interactions play important roles in controlling the functions of the microbial communities. In addition, viruses have been shown to be associated with complex diseases. Yet, investigations into the roles of viruses in human diseases are significantly underdeveloped. The objective of this proposal is to develop mathematically rigorous and computationally efficient approaches to deal with highly complex big data and the applications of these approaches to solve fundamental and important biological and biomedical problems. There are four interrelated aims. In Aim 1, we will theoretically investigate the power of the recently proposed model-free knockoffs (MFK) procedure, which has been theoretically justified to control FDR in arbitrary models and arbitrary dimensions. We will also theoretically justify the robustness of MFK with respect to the misspecification of covariate distribution. These studies will lay the foundations for our developments in other aims. In Aim 2, we will develop deep learning approaches to predict viral contigs with higher accuracy, integrate our new algorithm with MFK to achieve FDR control for virus motif discovery, and investigate the power and robustness of our new procedure. In Aim 3, we will take into account the virus-host motif interactions and adapt our algorithms and theories in Aim 2 for predicting virus-host infectious interaction status. In Aim 4, we will apply the developed methods from the first three aims to analyze the shotgun metagenomics data sets in ExperimentHub to identify viruses and virus-host interactions associated with several diseases at some target FDR level. Both the algorithms and results will be disseminated through the web. The results from this study will be important for metagenomics studies under a variety of environments.
大数据如今在现代科学研究的各个领域中无处不在。许多当代应用, 例如最近的国家微生物组计划(NMI),极大地需要高度灵活的统计机 可以产生可解释和可重复结果的学习方法。因此,它是最重要的 确定造成大量反应的关键因果因素非常重要 可用的协变量,可以统计地表示为错误发现率(FDR)控制 一般高维非线性模型。尽管鸟枪法宏基因组有巨大的应用 研究中,大多数现有研究集中在细菌有机体的研究上。然而,病毒 病毒与宿主的相互作用在控制微生物群落的功能方面发挥着重要作用。在 此外,病毒已被证明与复杂的疾病有关。然而,调查 病毒在人类疾病中的作用还远未得到充分研究。该提案的目的是 开发数学严谨且计算高效的方法来处理高度复杂的大数据 数据以及这些方法的应用来解决基本和重要的生物和 生物医学问题。有四个相互关联的目标。在目标 1 中,我们将从理论上研究功率 最近提出的无模型仿冒(MFK)程序,该程序在理论上已被证明是合理的 控制任意模型和任意维度的 FDR。我们还将从理论上证明稳健性 MFK 关于协变量分布的错误指定。这些研究将奠定基础 为了我们在其他目标上的发展。在目标 2 中,我们将开发深度学习方法来预测病毒 contigs 具有更高的准确度,将我们的新算法与 MFK 相结合,实现对病毒基序的 FDR 控制 发现,并研究我们新程序的威力和稳健性。在目标 3 中,我们将考虑 考虑病毒-宿主基序相互作用并调整我们在目标 2 中的算法和理论进行预测 病毒-宿​​主感染相互作用状态。在目标 4 中,我们将应用前三个方法中开发的方法 旨在分析 ExperimentHub 中的鸟枪法宏基因组数据集,以识别病毒和病毒宿主 在某些目标 FDR 水平上与多种疾病相关的相互作用。算法和结果 将通过网络传播。这项研究的结果对于宏基因组学很重要 在各种环境下学习。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Yingying Fan其他文献

Yingying Fan的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Yingying Fan', 18)}}的其他基金

Adaptive Reproducible High-Dimensional Nonlinear Inference for Big Biological Data
生物大数据的自适应可再现高维非线性推理
  • 批准号:
    9674585
  • 财政年份:
    2018
  • 资助金额:
    $ 27.99万
  • 项目类别:
Adaptive Reproducible High-Dimensional Nonlinear Inference for Big Biological Data
生物大数据的自适应可再现高维非线性推理
  • 批准号:
    10159277
  • 财政年份:
    2018
  • 资助金额:
    $ 27.99万
  • 项目类别:
Adaptive Reproducible High-Dimensional Nonlinear Inference for Big Biological Data
生物大数据的自适应可再现高维非线性推理
  • 批准号:
    9923688
  • 财政年份:
    2018
  • 资助金额:
    $ 27.99万
  • 项目类别:

相似海外基金

DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
  • 批准号:
    EP/Y029089/1
  • 财政年份:
    2024
  • 资助金额:
    $ 27.99万
  • 项目类别:
    Research Grant
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
  • 批准号:
    2337776
  • 财政年份:
    2024
  • 资助金额:
    $ 27.99万
  • 项目类别:
    Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
  • 批准号:
    2338816
  • 财政年份:
    2024
  • 资助金额:
    $ 27.99万
  • 项目类别:
    Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
  • 批准号:
    2338846
  • 财政年份:
    2024
  • 资助金额:
    $ 27.99万
  • 项目类别:
    Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
  • 批准号:
    2348261
  • 财政年份:
    2024
  • 资助金额:
    $ 27.99万
  • 项目类别:
    Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
  • 批准号:
    2348346
  • 财政年份:
    2024
  • 资助金额:
    $ 27.99万
  • 项目类别:
    Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
  • 批准号:
    2348457
  • 财政年份:
    2024
  • 资助金额:
    $ 27.99万
  • 项目类别:
    Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
  • 批准号:
    2404989
  • 财政年份:
    2024
  • 资助金额:
    $ 27.99万
  • 项目类别:
    Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
  • 批准号:
    2339310
  • 财政年份:
    2024
  • 资助金额:
    $ 27.99万
  • 项目类别:
    Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
  • 批准号:
    2339669
  • 财政年份:
    2024
  • 资助金额:
    $ 27.99万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了