Statistical Network Analysis: Model Selection, Differential Privacy, and Dynamic Structures

统计网络分析:模型选择、差分隐私和动态结构

基本信息

项目摘要

In this proposal we tackle some challenging problems in the following three aspects of statistical network analysis.1. Jittered resampling for selecting network modelsThe first and arguably the most important step in statistical modelling is to choose an appropriate model for a given data set. While there exist many data-driven model-selection methods in statistics in general, including those based on data reuse (i.e., bootstrap resampling, cross-validation), their application to network data is problematic. Therefore it remains common to choose a network model subjectively. The major difficulty in the reuse of network data is to mimic the underlying probability mechanisms. A few existing attempts include cross-validation under some specific settings. We propose a new `bootstrap jittering' or `jittered resampling' method for selecting an appropriate network model. The method does not impose any specific forms/conditions, therefore providing a generic tool for network model selection.2. Edge differential privacy for network dataIn network data individuals are typically represented by nodes and their inter-relationships are represented by edges. Therefore network data often contain sensitive individual/personal information. On the other hand the information of interest in the data should be perserved. Hence the primary concern for data privacy is two-folded: (a) to release only a sanitized version of the original network data to protect privacy, and (b) the sanitized data should preserves the information of interest such that the analysis based on the sanitized data is still meaningful. This is a vibrant research area now as data privacy becomes ever increasingly sensitive and important with available abundant personal information in digital format in this information age, though the contribution from statistics is still at a preliminary stage. We will adopt the so-called dyadwise randomized response approach. While such a scheme is differentially private, the inference based on the released data is largely unknown. Our initial investigation reveals some attractive features of this approach, suggesting more efficient statistical inference than those based on other data release mechanism. We will further develop this scheme to handle networks with additional node features/attributes (e.g., social networks with additional information on age, gender, hobby, occupation etc).3. Modelling and forecasting dynamic networksMost existing statistical inference methods for networks are confined to static network data, though a substantial proportion of real networks are dynamic in nature. Understanding and being able to forecast the changes over time are of immense importance for, e.g., monitoring anomalies in internet traffic networks, predicting demand and setting pricing in electricity supply networks, managing natural resources in environmental readings in sensor networks, and understanding how news and opinion propagates in online social networks. Unfortunately the development of the foundation for dynamic networks is still in its infancy, and the available modelling and inference tools are sparse. As for dealing with dynamic changes of networks, most available techniques are based on the evolution analysis of snapshot networks over time without really modelling the changes dynamically. Although this reflects the fact that most networks change slowly over time, it does not provides any insight on the dynamics underlying the changes and is almost powerless for future prediction for which it is essential to build appropriate stochastic models to capture dynamic dependence and dynamic changes explicitly. Combining recent developments on tensor decomposition and factor-driven dimension reduction with the efficient time series tools such as exponential smoothing and Kalman filters, we will take on this challenge to build some new dynamic models.
在这个建议中,我们解决了统计网络分析的以下三个方面的一些具有挑战性的问题.选择网络模型的抖动响应统计建模的第一步,也可以说是最重要的一步,是为给定的数据集选择合适的模型。虽然在一般统计学中存在许多数据驱动的模型选择方法,包括基于数据重用的方法(即,自举响应、交叉验证),但是它们对网络数据的应用是有问题的。因此,主观地选择网络模型仍然是常见的。网络数据重用的主要困难是模仿潜在的概率机制。一些现有的尝试包括在某些特定设置下的交叉验证。我们提出了一个新的“引导抖动”或“抖动响应”的方法来选择一个合适的网络模型。该方法不强加任何特定的形式/条件,因此提供了一个通用的工具,网络模型的选择.网络数据的边差分隐私在网络数据中,个体通常用节点表示,而它们之间的关系用边表示。因此,网络数据通常包含敏感的个人/个人信息。另一方面,数据中感兴趣的信息应该被保存。因此,对数据隐私的主要关注是双重的:(a)仅发布原始网络数据的净化版本以保护隐私,以及(B)净化数据应该保留感兴趣的信息,使得基于净化数据的分析仍然有意义。这是一个充满活力的研究领域,现在随着数据隐私变得越来越敏感和重要,在这个信息时代,以数字格式提供大量的个人信息,尽管统计的贡献仍处于初步阶段。我们将采用所谓的二元随机应答方法。虽然这样的方案是差分私有的,但基于发布的数据的推断在很大程度上是未知的。我们的初步调查显示,这种方法的一些有吸引力的功能,建议更有效的统计推断比其他数据发布机制的基础上。我们将进一步开发此方案,以处理具有额外节点特征/属性的网络(例如,具有关于年龄、性别、爱好、职业等的附加信息的社交网络)。动态网络的建模和预测尽管真实的网络中有相当一部分是动态的,但大多数现有的网络统计推断方法都局限于静态网络数据。理解并能够预测随时间的变化是非常重要的,例如,监测互联网交通网络中的异常情况,预测电力供应网络中的需求和定价,管理传感器网络中环境读数中的自然资源,以及了解新闻和意见如何在在线社交网络中传播。不幸的是,动态网络基础的发展仍处于起步阶段,可用的建模和推理工具很少。在处理网络动态变化时,大多数现有技术都是基于快照网络随时间的演化分析,而没有真正动态地建模变化。虽然这反映了大多数网络随着时间的推移而缓慢变化的事实,但它并没有提供任何关于变化背后的动态的见解,并且对于未来的预测几乎无能为力,因此必须建立适当的随机模型来明确地捕获动态依赖性和动态变化。结合张量分解和因子驱动降维的最新发展与有效的时间序列工具,如指数平滑和卡尔曼滤波器,我们将承担这一挑战,建立一些新的动态模型。

项目成果

期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
An autocovariance-based learning framework for high-dimensional functional time series
基于自协方差的高维函数时间序列学习框架
  • DOI:
    10.1016/j.jeconom.2023.01.007
  • 发表时间:
    2020-08
  • 期刊:
  • 影响因子:
    6.3
  • 作者:
    Jinyuan Chang;Cheng Chen;Xinghao Qiao;Qiwei Yao
  • 通讯作者:
    Qiwei Yao
Day-ahead probabilistic forecasting for French half-hourly electricity loads and quantiles for curve-to-curve regression
  • DOI:
    10.1016/j.apenergy.2021.117465
  • 发表时间:
    2021-11
  • 期刊:
  • 影响因子:
    11.2
  • 作者:
    Xiuqin Xu;Ying Chen;Y. Goude;Q. Yao
  • 通讯作者:
    Xiuqin Xu;Ying Chen;Y. Goude;Q. Yao
Autoregressive Networks
  • DOI:
  • 发表时间:
    2020-10
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Binyan Jiang;Jialiang Li;Q. Yao
  • 通讯作者:
    Binyan Jiang;Jialiang Li;Q. Yao
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Qiwei Yao其他文献

Repeated likelihood ratio test for the variance of normal distribution with unknown mean
Modelling Multivariate Volatilities via Latent Common Factors
通过潜在公因子对多元波动率进行建模
Blind Source Separation over Space: An Eigenanalysis Approach
  • DOI:
    doi.org/10.5705/ss.202023.0157
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
  • 作者:
    Bo Zhang;Sixing Hao;Qiwei Yao
  • 通讯作者:
    Qiwei Yao
Testing for unit roots based on sample autocovariances
  • DOI:
    DOI:10.1093/biomet/asab034
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
  • 作者:
    Jinyuan Chang;Guanghui Cheng;Qiwei Yao
  • 通讯作者:
    Qiwei Yao
Testing for unit roots based on sample autocovariances
基于样本自协方差的单位根测试
  • DOI:
    10.1093/biomet/asab034
  • 发表时间:
    2020-06
  • 期刊:
  • 影响因子:
    2.7
  • 作者:
    Jinyuan Chang;Guanghui Cheng;Qiwei Yao
  • 通讯作者:
    Qiwei Yao

Qiwei Yao的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Qiwei Yao', 18)}}的其他基金

Modelling Vast Time Series: Sparsity and Segmentation
大规模时间序列建模:稀疏性和分段
  • 批准号:
    EP/L01226X/1
  • 财政年份:
    2014
  • 资助金额:
    $ 63.86万
  • 项目类别:
    Research Grant
High-Dimensional Time Series, Common Factors, and Nonstationarity
高维时间序列、公因子和非平稳性
  • 批准号:
    EP/H010408/1
  • 财政年份:
    2010
  • 资助金额:
    $ 63.86万
  • 项目类别:
    Research Grant

相似国自然基金

多维在线跨语言Calling Network建模及其在可信国家电子税务软件中的实证应用
  • 批准号:
    91418205
  • 批准年份:
    2014
  • 资助金额:
    170.0 万元
  • 项目类别:
    重大研究计划
基于Wireless Mesh Network的分布式操作系统研究
  • 批准号:
    60673142
  • 批准年份:
    2006
  • 资助金额:
    27.0 万元
  • 项目类别:
    面上项目

相似海外基金

Statistical methods for co-expression network analysis of population-scale scRNA-seq data
群体规模 scRNA-seq 数据共表达网络分析的统计方法
  • 批准号:
    10740240
  • 财政年份:
    2023
  • 资助金额:
    $ 63.86万
  • 项目类别:
Development and innovation of statistical theory and methodology of network meta-analysis
网络荟萃分析统计理论与方法的发展与创新
  • 批准号:
    22H03554
  • 财政年份:
    2022
  • 资助金额:
    $ 63.86万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Statistical Methods for Network-based Integrative Analysis of Microbiome Data
基于网络的微生物组数据综合分析的统计方法
  • 批准号:
    10708748
  • 财政年份:
    2022
  • 资助金额:
    $ 63.86万
  • 项目类别:
Developments of statistical inference, prediction, and modeling methods for network meta-analysis
网络元分析统计推断、预测和建模方法的发展
  • 批准号:
    19H04074
  • 财政年份:
    2019
  • 资助金额:
    $ 63.86万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Statistical analysis of network performance and subscriber audio quality
网络性能和用户音频质量的统计分析
  • 批准号:
    533977-2018
  • 财政年份:
    2019
  • 资助金额:
    $ 63.86万
  • 项目类别:
    Experience Awards (previously Industrial Undergraduate Student Research Awards)
ATD: A Statistical Geo-Enabled Dynamic Human Network Analysis
ATD:统计地理支持的动态人类网络分析
  • 批准号:
    1737885
  • 财政年份:
    2017
  • 资助金额:
    $ 63.86万
  • 项目类别:
    Continuing Grant
Statistical Methods for Big Network Flow Data Analysis
大网络流量数据分析的统计方法
  • 批准号:
    1954015
  • 财政年份:
    2017
  • 资助金额:
    $ 63.86万
  • 项目类别:
    Studentship
Statistical Models for Genetic Studies, Using Network and Integrative Analysis
使用网络和综合分析的遗传研究统计模型
  • 批准号:
    9920162
  • 财政年份:
    2016
  • 资助金额:
    $ 63.86万
  • 项目类别:
Statistical Models for Genetic Studies, Using Network and Integrative Analysis
使用网络和综合分析的遗传研究统计模型
  • 批准号:
    10134596
  • 财政年份:
    2016
  • 资助金额:
    $ 63.86万
  • 项目类别:
Collaborative Research: Statistical Methodology for Network based Integrative Analysis of Omics Data
合作研究:基于网络的组学数据综合分析统计方法
  • 批准号:
    1545277
  • 财政年份:
    2015
  • 资助金额:
    $ 63.86万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了