Adaptive Reproducible High-Dimensional Nonlinear Inference for Big Biological Data
生物大数据的自适应可再现高维非线性推理
基本信息
- 批准号:9674585
- 负责人:
- 金额:$ 28.97万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-08-01 至 2022-04-30
- 项目状态:已结题
- 来源:
- 关键词:AddressAlgorithmsArchaeaAttentionBacteriaBig DataBiologicalBypassCellsColorectal CancerComplexComputer softwareConsultCoupledDataData SetDevelopmentDimensionsDiseaseEcosystemEffectivenessEnvironmentFoundationsFrequenciesGaussian modelGenesGenetic MaterialsGenomicsHealthcareHumanInternetInvestigationJointsLengthLinear RegressionsLiteratureLiver CirrhosisMachine LearningMarinesMathematicsMetagenomicsMethodsModelingModernizationMolecularMolecular Sequence DataMutationNeurosciencesNon-Insulin-Dependent Diabetes MellitusNon-linear ModelsObesityOrganismPerformancePlanet EarthPlayProceduresReproducibilityReproducibility of ResultsResearchResearch PersonnelRoleSamplingSampling StudiesShotgunsSocial SciencesTestingTheoretical StudiesTissuesTrainingViralVirusVisualization softwareWorkbasebiological researchcomputerized toolsdark matterdeep learningdesignflexibilityhigh dimensionalityhuman diseasehuman tissueimprovedinterestlearning strategymetagenomic sequencingmicrobial communitymicrobiomemicrobiome researchmodel designmodel developmentnew technologynovelpower analysisresponsesimulationtheoriestraituser-friendlyvirus host interactionvirus identification
项目摘要
Big data is now ubiquitous in every field of modern scientific research. Many contemporary applications,
such as the recent national microbiome initiative (NMI), greatly demand highly flexible statistical machine
learning methods that can produce both interpretable and reproducible results. Thus, it is of paramount
importance to identify crucial causal factors that are responsible for the response from a large number of
available covariates, which can be statistically formulated as the false discovery rate (FDR) control in
general high-dimensional nonlinear models. Despite the enormous applications of shotgun metagenomic
studies, most existing investigations concentrate on the study of bacterial organisms. However, viruses
and virus-host interactions play important roles in controlling the functions of the microbial communities. In
addition, viruses have been shown to be associated with complex diseases. Yet, investigations into the
roles of viruses in human diseases are significantly underdeveloped. The objective of this proposal is to
develop mathematically rigorous and computationally efficient approaches to deal with highly complex big
data and the applications of these approaches to solve fundamental and important biological and
biomedical problems. There are four interrelated aims. In Aim 1, we will theoretically investigate the power
of the recently proposed model-free knockoffs (MFK) procedure, which has been theoretically justified to
control FDR in arbitrary models and arbitrary dimensions. We will also theoretically justify the robustness
of MFK with respect to the misspecification of covariate distribution. These studies will lay the foundations
for our developments in other aims. In Aim 2, we will develop deep learning approaches to predict viral
contigs with higher accuracy, integrate our new algorithm with MFK to achieve FDR control for virus motif
discovery, and investigate the power and robustness of our new procedure. In Aim 3, we will take into
account the virus-host motif interactions and adapt our algorithms and theories in Aim 2 for predicting
virus-host infectious interaction status. In Aim 4, we will apply the developed methods from the first three
aims to analyze the shotgun metagenomics data sets in ExperimentHub to identify viruses and virus-host
interactions associated with several diseases at some target FDR level. Both the algorithms and results
will be disseminated through the web. The results from this study will be important for metagenomics
studies under a variety of environments.
如今,大数据在现代科学研究的各个领域无处不在。许多当代应用,
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Yingying Fan其他文献
Yingying Fan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Yingying Fan', 18)}}的其他基金
Adaptive Reproducible High-Dimensional Nonlinear Inference for Big Biological Data
生物大数据的自适应可再现高维非线性推理
- 批准号:
10159277 - 财政年份:2018
- 资助金额:
$ 28.97万 - 项目类别:
Adaptive Reproducible High-Dimensional Nonlinear Inference for Big Biological Data
生物大数据的自适应可再现高维非线性推理
- 批准号:
9923688 - 财政年份:2018
- 资助金额:
$ 28.97万 - 项目类别:
Adaptive Reproducible High-Dimensional Nonlinear Inference for Big Biological Data
生物大数据的自适应可再现高维非线性推理
- 批准号:
9753295 - 财政年份:2018
- 资助金额:
$ 28.97万 - 项目类别:
相似海外基金
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
- 批准号:
EP/Y029089/1 - 财政年份:2024
- 资助金额:
$ 28.97万 - 项目类别:
Research Grant
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 28.97万 - 项目类别:
Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
- 批准号:
2338816 - 财政年份:2024
- 资助金额:
$ 28.97万 - 项目类别:
Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
- 批准号:
2338846 - 财政年份:2024
- 资助金额:
$ 28.97万 - 项目类别:
Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
- 批准号:
2348261 - 财政年份:2024
- 资助金额:
$ 28.97万 - 项目类别:
Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
- 批准号:
2348346 - 财政年份:2024
- 资助金额:
$ 28.97万 - 项目类别:
Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
- 批准号:
2348457 - 财政年份:2024
- 资助金额:
$ 28.97万 - 项目类别:
Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
$ 28.97万 - 项目类别:
Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
- 批准号:
2339310 - 财政年份:2024
- 资助金额:
$ 28.97万 - 项目类别:
Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
- 批准号:
2339669 - 财政年份:2024
- 资助金额:
$ 28.97万 - 项目类别:
Continuing Grant