Adaptive Reproducible High-Dimensional Nonlinear Inference for Big Biological Data
生物大数据的自适应可再现高维非线性推理
基本信息
- 批准号:9923688
- 负责人:
- 金额:$ 27.67万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-08-01 至 2022-04-30
- 项目状态:已结题
- 来源:
- 关键词:AddressAlgorithmsArchaeaAttentionBacteriaBig DataBiologicalBypassCellsColorectal CancerComplexComputer softwareConsultCoupledDataData SetDevelopmentDimensionsDiseaseEcosystemEffectivenessEnvironmentFoundationsFrequenciesGaussian modelGenesGenetic MaterialsGenomicsHealthcareHumanInternetInvestigationJointsLengthLinear RegressionsLiteratureLiver CirrhosisMathematicsMetagenomicsMethodsModelingModernizationMolecularMolecular Sequence DataMutationNeurosciencesNon-Insulin-Dependent Diabetes MellitusNon-linear ModelsObesityOrganismPerformancePlanet EarthPlayProceduresReproducibilityReproducibility of ResultsResearchResearch PersonnelRoleSamplingSampling StudiesShotgunsSocial SciencesTestingTheoretical StudiesTissuesTrainingViralVirusVisualization softwareWorkbasebiological researchcomputerized toolscontigdark matterdeep learningdeep learning algorithmdesignflexibilityhigh dimensionalityhuman diseasehuman tissueimprovedinterestlearning strategymachine learning methodmetagenomic sequencingmicrobial communitymicrobiomemicrobiome researchmodel designmodel developmentnew technologynovelpower analysisresponsesimulationstatistical and machine learningtheoriestraituser-friendlyvirus host interactionvirus identification
项目摘要
Big data is now ubiquitous in every field of modern scientific research. Many contemporary applications,
such as the recent national microbiome initiative (NMI), greatly demand highly flexible statistical machine
learning methods that can produce both interpretable and reproducible results. Thus, it is of paramount
importance to identify crucial causal factors that are responsible for the response from a large number of
available covariates, which can be statistically formulated as the false discovery rate (FDR) control in
general high-dimensional nonlinear models. Despite the enormous applications of shotgun metagenomic
studies, most existing investigations concentrate on the study of bacterial organisms. However, viruses
and virus-host interactions play important roles in controlling the functions of the microbial communities. In
addition, viruses have been shown to be associated with complex diseases. Yet, investigations into the
roles of viruses in human diseases are significantly underdeveloped. The objective of this proposal is to
develop mathematically rigorous and computationally efficient approaches to deal with highly complex big
data and the applications of these approaches to solve fundamental and important biological and
biomedical problems. There are four interrelated aims. In Aim 1, we will theoretically investigate the power
of the recently proposed model-free knockoffs (MFK) procedure, which has been theoretically justified to
control FDR in arbitrary models and arbitrary dimensions. We will also theoretically justify the robustness
of MFK with respect to the misspecification of covariate distribution. These studies will lay the foundations
for our developments in other aims. In Aim 2, we will develop deep learning approaches to predict viral
contigs with higher accuracy, integrate our new algorithm with MFK to achieve FDR control for virus motif
discovery, and investigate the power and robustness of our new procedure. In Aim 3, we will take into
account the virus-host motif interactions and adapt our algorithms and theories in Aim 2 for predicting
virus-host infectious interaction status. In Aim 4, we will apply the developed methods from the first three
aims to analyze the shotgun metagenomics data sets in ExperimentHub to identify viruses and virus-host
interactions associated with several diseases at some target FDR level. Both the algorithms and results
will be disseminated through the web. The results from this study will be important for metagenomics
studies under a variety of environments.
大数据现在无处不在现代科学研究的各个领域。许多当代应用,
例如最近的国家微生物组计划(NMI),极大地要求高度灵活的统计机器
学习可以产生可解释和可重复结果的方法。因此,
重要的是,要确定造成大量
可用的协变量,可以在统计学上用公式表示为
一般的高维非线性模型。尽管鸟枪宏基因组学有着巨大的应用
尽管有许多研究,但大多数现有的研究集中在细菌有机体的研究上。然而,病毒
病毒与宿主的相互作用在控制微生物群落功能方面起着重要作用。在
此外,病毒已被证明与复杂疾病有关。然而,对
病毒在人类疾病中的作用还远远没有得到充分的研究。这项建议的目的是
开发数学上严格和计算效率高的方法来处理高度复杂的大规模
数据和这些方法的应用,以解决基本和重要的生物和
生物医学问题。有四个相互关联的目标。在目标1中,我们将从理论上研究
最近提出的无模型仿制品(MFK)程序,理论上已经证明,
控制任意模型和任意尺寸的FDR。我们还将从理论上证明其稳健性
MFK关于协变量分布的误指定。这些研究将奠定基础
我们在其他方面的发展。在目标2中,我们将开发深度学习方法来预测病毒
将新算法与MFK算法相结合,实现了对病毒模体的FDR控制
发现,并调查我们的新程序的能力和鲁棒性。在目标3中,我们将
考虑到病毒-宿主基序相互作用,并调整我们在Aim 2中的算法和理论,
病毒-宿主感染相互作用状态。在目标4中,我们将应用前三个开发的方法
旨在分析ExperimentHub中的鸟枪宏基因组学数据集,以识别病毒和病毒宿主
在某些目标FDR水平上与几种疾病相关的相互作用。算法和结果
将通过网络传播。这项研究的结果将对宏基因组学非常重要
在各种环境下进行研究。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Yingying Fan其他文献
Yingying Fan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Yingying Fan', 18)}}的其他基金
Adaptive Reproducible High-Dimensional Nonlinear Inference for Big Biological Data
生物大数据的自适应可再现高维非线性推理
- 批准号:
9674585 - 财政年份:2018
- 资助金额:
$ 27.67万 - 项目类别:
Adaptive Reproducible High-Dimensional Nonlinear Inference for Big Biological Data
生物大数据的自适应可再现高维非线性推理
- 批准号:
10159277 - 财政年份:2018
- 资助金额:
$ 27.67万 - 项目类别:
Adaptive Reproducible High-Dimensional Nonlinear Inference for Big Biological Data
生物大数据的自适应可再现高维非线性推理
- 批准号:
9753295 - 财政年份:2018
- 资助金额:
$ 27.67万 - 项目类别:
相似海外基金
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
- 批准号:
EP/Y029089/1 - 财政年份:2024
- 资助金额:
$ 27.67万 - 项目类别:
Research Grant
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 27.67万 - 项目类别:
Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
- 批准号:
2338816 - 财政年份:2024
- 资助金额:
$ 27.67万 - 项目类别:
Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
- 批准号:
2338846 - 财政年份:2024
- 资助金额:
$ 27.67万 - 项目类别:
Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
- 批准号:
2348261 - 财政年份:2024
- 资助金额:
$ 27.67万 - 项目类别:
Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
- 批准号:
2348346 - 财政年份:2024
- 资助金额:
$ 27.67万 - 项目类别:
Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
- 批准号:
2348457 - 财政年份:2024
- 资助金额:
$ 27.67万 - 项目类别:
Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
$ 27.67万 - 项目类别:
Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
- 批准号:
2339310 - 财政年份:2024
- 资助金额:
$ 27.67万 - 项目类别:
Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
- 批准号:
2339669 - 财政年份:2024
- 资助金额:
$ 27.67万 - 项目类别:
Continuing Grant