权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: Inference for Network Models with Covariates: Leveraging Local Information for Statistically and Computationally Efficient Estimation of Global Parameters

协作研究：具有协变量的网络模型的推理：利用局部信息对全局参数进行统计和计算上的高效估计

基本信息

批准号：
1713082
负责人：
Purnamrita Sarkar
金额：
$ 16万
依托单位：
University of Texas at Austin
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2017
资助国家：
美国
起止时间：
2017-07-01 至 2021-06-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1713082&HistoricalAwards=false
关键词：
Collaborative Research Inference Network Models

项目摘要

Large datasets, which are naturally modeled as a network or graph, arise in almost every field of human endeavor. For example, Facebook is a social network, where nodes are users, with edges corresponding to friendships. In gene networks, nodes represent genes with connections corresponding to their co-expression. In ecological networks, the nodes are animal species, with edges determined according to who eats whom. A major focus of research for network or graph data has been on identifying community membership of the nodes. However, what is often more important for scientific purposes is examining the nature and evolution of edge and membership probabilities, for instance changes in gene features of individuals as a function of some unknown factor, like a disease. The focus on using other measured features of nodes and edges could add, in decisive ways, to the information available from observed edges or interactions between nodes. These could be disease symptoms or test results, or demographic information of users in social networks. Statistical inference in such models, despite its importance, has only just begun to be studied. There are both theoretical and computational challenges, due both to the complexity of models fitted, and the size of data sets. The research will lead to the development of algorithms for fitting models and statistical measures of confidence, with potential applications to many fields. The research is focused on block models for graphs, when node or edge covariates are present. When formulated, these models are no longer block models, but models whose membership probabilities depend upon covariates and whose connection probabilities depend both on block membership and individual covariates. Fitting algorithms involve alternating between fitting block and covariate parameters. Variational (mean field) approaches which effectively lead to semi-parametric model fitting with nK membership "nuisance" parameters, with n representing the number of nodes and K the number of communities, are examined. As these approaches have been found by the PIs to be unstable for large n, the PIs have already begun to investigate the theoretical and practical aspects of divide and conquer algorithms where many subgraphs are independently fit. The PIs will study the statistical properties, both asymptotically and through simulations, and develop practicable and computationally stable methods for large, relatively sparse graphs.

大数据集自然地被建模为网络或图形，几乎出现在人类工作的每一个领域。例如，Facebook是一个社交网络，其中的节点是用户，边缘对应于友谊。在基因网络中，节点代表具有与其共表达相对应的连接的基因。在生态网络中，节点是动物物种，其边缘取决于谁吃谁。网络或图形数据研究的一个主要焦点一直是识别节点的社区成员身份。然而，对于科学目的来说，更重要的往往是检查边缘概率和成员概率的性质和演变，例如，个人基因特征作为某种未知因素的函数的变化，如疾病。将重点放在使用节点和边的其他测量特征上，可以决定性地增加从观察到的边或节点之间的相互作用获得的信息。这些可能是疾病症状或检测结果，或者是社交网络用户的人口统计信息。尽管这类模型的统计推断很重要，但人们对它的研究才刚刚开始。既有理论上的挑战，也有计算上的挑战，这既是由于所拟合的模型的复杂性，也是由于数据集的大小。这项研究将导致开发用于拟合模型和统计置信度度量的算法，并有可能在许多领域应用。研究的重点是图的块模型，当节点或边协变量存在时。在公式化后，这些模型不再是块模型，而是其成员概率取决于协变量、其连接概率取决于块成员资格和单个协变量的模型。拟合算法包括在拟合块参数和协变量参数之间交替。研究了有效地导致半参数模型拟合的变分(平均场)方法，其中n表示节点数，K表示社区数。由于PI已经发现这些方法对于大n是不稳定的，所以PI已经开始研究分治算法的理论和实践方面，其中许多子图是独立适合的。PI将通过渐近和模拟来研究统计特性，并为大型、相对稀疏的图形开发实用且计算稳定的方法。