权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Causal Structure Learning from Sparse High Dimensional Data

从稀疏高维数据中学习因果结构

基本信息

批准号：
RGPIN-2021-02856
负责人：
Ali, Rebecca
金额：
$ 1.31万
依托单位：
University of Guelph
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=758895
关键词：
Causal Structure Learning Sparse Dimensional

项目摘要

This research program studies causal learning in complex natural systems. The foundational challenge lies in inferring links in a high dimensional network based on limited observed data. For example, which genes can best prevent boar taint (an off-taste in pork meat)? Which regions of the brain are associated with cognitive function? Alternatively, what are the plant and pollinator traits driving plant and pollinator interactions? Can we predict interactions? Global concerns such as climate change, disease control, food security, and environmental management are replete with open problems. A general solution does not exist. Our approach is to develop powerful learning algorithms that can help researchers unlock these relationships. Animal breeding programs often require learning the structure of large genetic networks and identifying candidates for genetic selection. Consider boar taint, which is caused by high levels of two compounds. Candidate targets would be genes that reduce levels of one compound without adversely affecting fertility or production traits. The doubly sparse regression incorporating graphical structure of predictors (DSRIG) model leverages the (undirected) graph structure over predictors (gene expression levels) to improve prediction for a quantitative trait (boar taint). However, DSRIG is computationally intensive and does not distinguish between potential predictors that influence the response versus variables influenced by the response. Conservation management programs require learning the drivers of link formation, such as plant and pollinator species traits relevant to pollination. Anticipating which plant and pollinator species are most vulnerable to extinction or identifying species important in structuring the community can inform resource management and allocation efforts. Regularized grouped Dirichlet-multinomial (DM) regression is a consumer-resource model that models plant-pollinator interactions as a function of plant and pollinator traits. Unfortunately, survey data often underrepresent or exclude rare interactions. There is no established method to incorporate environmental covariates in the model or compare community structures across networks. The main objectives of the proposed program are to extend 1. the DSRIG framework to the causal (directed) graph setting, and 2. the grouped DM framework to compare networks over space or time. Short term goals include improving optimization of DSRIG; exploiting the directed structure of the predictor graph for a univariate response; extending the DM regression framework for zero-inflation; and comparing two networks over a (e.g., soil) gradient. Long term goals include extending DSRIG to the directed multivariate response setting and modelling bipartite networks more broadly over space and time. This research would benefit Canadian genetic selection and animal health monitoring programs as well as inform conservation and resource management practices.

这个研究项目研究复杂自然系统中的因果学习。最基本的挑战在于基于有限的观测数据推断高维网络中的链接。例如，哪些基因可以最好地防止野猪污染(猪肉中的一种异味)？大脑的哪些区域与认知功能有关？或者，什么是植物和传粉者的特征驱动植物和传粉者的相互作用？我们能预测相互作用吗？气候变化、疾病控制、粮食安全和环境管理等全球性问题充满了悬而未决的问题。不存在一般的解决方案。我们的方法是开发强大的学习算法，帮助研究人员解锁这些关系。动物育种项目通常需要学习大型遗传网络的结构，并确定遗传选择的候选者。考虑一下野猪的污染，它是由两种化合物的高水平引起的。候选目标将是在不对生育或生产性状产生不利影响的情况下降低一种化合物水平的基因。双重稀疏回归结合预测因子的图形结构(DSRIG)模型利用预测因子(基因表达水平)上的(无向)图结构来改进对数量性状(野猪污染)的预测。然而，DSRIG是计算密集型的，并且不区分影响响应的潜在预测者和受响应影响的变量。保护管理计划需要学习链接形成的驱动因素，如与授粉相关的植物和传粉者物种特征。预测哪些植物和传粉者物种最容易灭绝，或确定对构建群落很重要的物种，可以为资源管理和分配工作提供信息。正则化分组狄利克雷多项式(DM)回归是一个消费者-资源模型，它将植物-传粉者之间的相互作用模拟为植物和传粉者性状的函数。不幸的是，调查数据往往低估或排除了罕见的相互作用。目前还没有既定的方法来将环境协变量纳入模型中，或者比较网络中的社区结构。该方案的主要目标是将1.DSRIG框架扩展到因果(有向)图设置，以及2.分组DM框架以比较空间或时间上的网络。短期目标包括改进DSRIG的优化；利用预测图的有向结构来获得单变量响应；扩展DM回归框架以实现零通胀；以及比较(例如，土壤)坡度上的两个网络。长期目标包括将DSRIG扩展到定向多变量响应环境，并在空间和时间上更广泛地建模二部网络。这项研究将有助于加拿大的遗传选择和动物健康监测计划，并为保护和资源管理实践提供信息。