权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Statistical Network Analysis: Model Selection, Differential Privacy, and Dynamic Structures

统计网络分析：模型选择、差分隐私和动态结构

基本信息

批准号：
EP/V007556/1
负责人：
Qiwei Yao
金额：
$ 63.86万
依托单位：
London School of Economics and Political Science
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2021
资助国家：
英国
起止时间：
2021 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FV007556%2F1
关键词：
Statistical Network Analysis Model Selection

项目摘要

In this proposal we tackle some challenging problems in the following three aspects of statistical network analysis.1. Jittered resampling for selecting network modelsThe first and arguably the most important step in statistical modelling is to choose an appropriate model for a given data set. While there exist many data-driven model-selection methods in statistics in general, including those based on data reuse (i.e., bootstrap resampling, cross-validation), their application to network data is problematic. Therefore it remains common to choose a network model subjectively. The major difficulty in the reuse of network data is to mimic the underlying probability mechanisms. A few existing attempts include cross-validation under some specific settings. We propose a new `bootstrap jittering' or `jittered resampling' method for selecting an appropriate network model. The method does not impose any specific forms/conditions, therefore providing a generic tool for network model selection.2. Edge differential privacy for network dataIn network data individuals are typically represented by nodes and their inter-relationships are represented by edges. Therefore network data often contain sensitive individual/personal information. On the other hand the information of interest in the data should be perserved. Hence the primary concern for data privacy is two-folded: (a) to release only a sanitized version of the original network data to protect privacy, and (b) the sanitized data should preserves the information of interest such that the analysis based on the sanitized data is still meaningful. This is a vibrant research area now as data privacy becomes ever increasingly sensitive and important with available abundant personal information in digital format in this information age, though the contribution from statistics is still at a preliminary stage. We will adopt the so-called dyadwise randomized response approach. While such a scheme is differentially private, the inference based on the released data is largely unknown. Our initial investigation reveals some attractive features of this approach, suggesting more efficient statistical inference than those based on other data release mechanism. We will further develop this scheme to handle networks with additional node features/attributes (e.g., social networks with additional information on age, gender, hobby, occupation etc).3. Modelling and forecasting dynamic networksMost existing statistical inference methods for networks are confined to static network data, though a substantial proportion of real networks are dynamic in nature. Understanding and being able to forecast the changes over time are of immense importance for, e.g., monitoring anomalies in internet traffic networks, predicting demand and setting pricing in electricity supply networks, managing natural resources in environmental readings in sensor networks, and understanding how news and opinion propagates in online social networks. Unfortunately the development of the foundation for dynamic networks is still in its infancy, and the available modelling and inference tools are sparse. As for dealing with dynamic changes of networks, most available techniques are based on the evolution analysis of snapshot networks over time without really modelling the changes dynamically. Although this reflects the fact that most networks change slowly over time, it does not provides any insight on the dynamics underlying the changes and is almost powerless for future prediction for which it is essential to build appropriate stochastic models to capture dynamic dependence and dynamic changes explicitly. Combining recent developments on tensor decomposition and factor-driven dimension reduction with the efficient time series tools such as exponential smoothing and Kalman filters, we will take on this challenge to build some new dynamic models.

在这个建议中，我们解决了统计网络分析的以下三个方面的一些具有挑战性的问题.选择网络模型的抖动响应统计建模的第一步，也可以说是最重要的一步，是为给定的数据集选择合适的模型。虽然在一般统计学中存在许多数据驱动的模型选择方法，包括基于数据重用的方法（即，自举响应、交叉验证），但是它们对网络数据的应用是有问题的。因此，主观地选择网络模型仍然是常见的。网络数据重用的主要困难是模仿潜在的概率机制。一些现有的尝试包括在某些特定设置下的交叉验证。我们提出了一个新的“引导抖动”或“抖动响应”的方法来选择一个合适的网络模型。该方法不强加任何特定的形式/条件，因此提供了一个通用的工具，网络模型的选择.网络数据的边差分隐私在网络数据中，个体通常用节点表示，而它们之间的关系用边表示。因此，网络数据通常包含敏感的个人/个人信息。另一方面，数据中感兴趣的信息应该被保存。因此，对数据隐私的主要关注是双重的：（a）仅发布原始网络数据的净化版本以保护隐私，以及（B）净化数据应该保留感兴趣的信息，使得基于净化数据的分析仍然有意义。这是一个充满活力的研究领域，现在随着数据隐私变得越来越敏感和重要，在这个信息时代，以数字格式提供大量的个人信息，尽管统计的贡献仍处于初步阶段。我们将采用所谓的二元随机应答方法。虽然这样的方案是差分私有的，但基于发布的数据的推断在很大程度上是未知的。我们的初步调查显示，这种方法的一些有吸引力的功能，建议更有效的统计推断比其他数据发布机制的基础上。我们将进一步开发此方案，以处理具有额外节点特征/属性的网络（例如，具有关于年龄、性别、爱好、职业等的附加信息的社交网络）。动态网络的建模和预测尽管真实的网络中有相当一部分是动态的，但大多数现有的网络统计推断方法都局限于静态网络数据。理解并能够预测随时间的变化是非常重要的，例如，监测互联网交通网络中的异常情况，预测电力供应网络中的需求和定价，管理传感器网络中环境读数中的自然资源，以及了解新闻和意见如何在在线社交网络中传播。不幸的是，动态网络基础的发展仍处于起步阶段，可用的建模和推理工具很少。在处理网络动态变化时，大多数现有技术都是基于快照网络随时间的演化分析，而没有真正动态地建模变化。虽然这反映了大多数网络随着时间的推移而缓慢变化的事实，但它并没有提供任何关于变化背后的动态的见解，并且对于未来的预测几乎无能为力，因此必须建立适当的随机模型来明确地捕获动态依赖性和动态变化。结合张量分解和因子驱动降维的最新发展与有效的时间序列工具，如指数平滑和卡尔曼滤波器，我们将承担这一挑战，建立一些新的动态模型。