权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

A reliable and scalable approach to causal inference for large-scale multivariate data

一种可靠且可扩展的大规模多元数据因果推理方法

基本信息

批准号：
1407028
负责人：
Garvesh Raskutti
金额：
$ 12万
依托单位：
University of Wisconsin-Madison
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2014
资助国家：
美国
起止时间：
2014-08-15 至 2017-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1407028&HistoricalAwards=false
关键词：
reliable scalable approach causal inference

项目摘要

With masses of large-scale data being generated, a key challenge facing many scientists is to infer relationships amongst variables of interest. In particular, inferring causal or functional relationships amongst genes, proteins, and other biological elements is of fundamental interest to scientists. This project will develop methods for inferring causal or functional relations between genetic, proteomic, and transcriptomic features both for the ENCODE human genome project and data for mice with different susceptibility to obesity and diabetes. For both types of data, this project will develop frameworks that comprise: (1) domain knowledge that informs the choice of model and algorithm; (2) fast, parallelizeable algorithms with provable run-time guarantees; and (3) statistical consistency guarantees for the algorithms developed under assumptions that are likely to be satisfied in practice.Directed graphical models or Bayesian networks provide a useful framework for representing causal or functional relationships. A number of algorithms have been developed for inferring directed or Bayesian networks from data. However prior approaches are either unreliable as they require assumptions that are rarely satisfied in practice, or do not scale to larger datasets. The proposed project will address this issue by developing algorithms for inferring directed networks with both statistical consistency guarantees and run-time guarantees. The new algorithms will involve exploiting connections between techniques in numerical linear algebra for developing fast solvers of linear systems and concepts in graph theory. Algorithms will be coded in R and will exploit parallel processing. Evaluation will involve both small-scale and large-scale synthetic graphical models with known network structure, real datasets involving yeast data where some of the directions are known, and new biochemistry data in which most of the directions are unknown. Theoretical guarantees on run-time and statistical consistency will be provided using a combination of tools from graph theory, numerical linear algebra, and concentration of measure the PI has used and developed in prior work.

随着大量大规模数据的产生，许多科学家面临的一个关键挑战是推断感兴趣的变量之间的关系。特别是，推断基因、蛋白质和其他生物元素之间的因果关系或功能关系是科学家最感兴趣的。该项目将为ENCODE人类基因组计划和具有不同肥胖和糖尿病易感性的小鼠数据开发推断遗传、蛋白质组学和转录组学特征之间因果关系或功能关系的方法。对于这两种类型的数据，该项目将开发框架，包括：(1)告知模型和算法选择的领域知识；(2)具有可证明的运行时保证的快速、可并行算法；(3)在实践中可能满足的假设下开发的算法的统计一致性保证。有向图形模型或贝叶斯网络为表示因果关系或功能关系提供了一个有用的框架。已经开发了许多算法来从数据中推断有向网络或贝叶斯网络。然而，先前的方法要么不可靠，因为它们需要的假设在实践中很少得到满足，要么不能扩展到更大的数据集。拟议的项目将通过开发算法来推断具有统计一致性保证和运行时保证的有向网络来解决这个问题。新的算法将涉及利用数值线性代数技术之间的联系，以开发线性系统的快速求解器和图论中的概念。算法将用R编码，并将利用并行处理。评估将涉及具有已知网络结构的小规模和大规模合成图形模型，涉及酵母数据的真实数据集，其中一些方向是已知的，以及大多数方向未知的新生物化学数据。运行时和统计一致性的理论保证将使用图论、数值线性代数和PI在先前工作中使用和开发的测量浓度的工具组合来提供。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Garvesh Raskutti其他文献

Network estimation via poisson autoregressive models

通过泊松自回归模型进行网络估计

DOI：
发表时间：
2017
期刊：
IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing
影响因子：
0
作者：
Benjamin Mark;Garvesh Raskutti;R. Willett
通讯作者：
R. Willett