CAREER: High-Dimensional Statistical Models for Unsupervised Learning

职业:无监督学习的高维统计模型

基本信息

  • 批准号:
    1945667
  • 负责人:
  • 金额:
    $ 40万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-07-15 至 2025-06-30
  • 项目状态:
    未结题

项目摘要

The growing awareness of the importance of data and data analysis, coupled with the unprecedented growth in the amount of data in recent years, has led to concerted efforts by researchers in the fields now collectively referred to as data sciences, to develop new models capable of handling big complex datasets. The vast majority of the available data is unlabeled, which makes the modeling problem more challenging. This project will advance the field of modeling big complex unlabeled data. The focus will be on learning from network data as well as learning dependency structures from regular data. Some of the concrete problems investigated are: What can an epidemic spreading over a network tell us about the structure of the network and the origin of the epidemic? What can the structure of the network tell us about the latent features of the nodes, for example, their grouping, or community in the case of social networks? Are there more refined structures in real networks beyond simple grouping or community structure? Can we learn complex networks from regular data that tell us about the nature of the dependency among the underlying variables (for example, what variables are the causes of a given variable)? How well do these often complex models fit the real data? Advancing on these questions has a direct impact on many scientific domains dealing with data. For example, genomics and computational biology, neuroscience, epidemiology, network security, social sciences and marketing, all benefit from advances in network analysis. Advances in dependency structure learning can improve causal inference procedures with impact on all scientific fields. This project on network epidemics has the potential to be transformative with immediate applications to the public health domain.This project advances the state-of-the art in inferring complex relations from data in an unsupervised fashion. As a result, network inference and graphical modeling will play prominent roles in our approach. We will consider four main tasks: 1) Developing goodness-of-fit tests for structured network models, in particular those used in community detection and clustering. Despite advances in network modeling, there are concerns that current models are not capturing the complexity of real networks. A first step toward realistic network modeling is developing tools for assessing how well the models fit. 2) Advancing the state-of-the-art in modeling complex networks, presenting ideas on capturing self-similarity in real networks as well as hierarchical statistical models for multilayer networks. 3) Advancing inference based on network dynamics: Many networks are accompanied by dynamics governed by the network structure, e.g., the spread of rumors and diseases. We often observe the result of the dynamics (who gets infected over time) and would like to make inference about the origin of the dynamic or the structure of the underlying network. We will address the challenges in dealing with these questions in real networks where the presence of many cycles and incomplete information about the dynamic pose serious difficulties. 4) Advancing inference of high-dimensional dependency structures: Characterizing dependencies (correlation, causation, etc.) among a collection of random variables is a fundamental task of statistical analysis. The principal investigator will explore learning high-dimensional directed graphical models from data that are suitable for causal interpretations.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
The growing awareness of the importance of data and data analysis, coupled with the unprecedented growth in the amount of data in recent years, has led to concerted efforts by researchers in the fields now collectively referred to as data sciences, to develop new models capable of handling big complex datasets. The vast majority of the available data is unlabeled, which makes the modeling problem more challenging. This project will advance the field of modeling big complex unlabeled data. The focus will be on learning from network data as well as learning dependency structures from regular data. Some of the concrete problems investigated are: What can an epidemic spreading over a network tell us about the structure of the network and the origin of the epidemic? What can the structure of the network tell us about the latent features of the nodes, for example, their grouping, or community in the case of social networks? Are there more refined structures in real networks beyond simple grouping or community structure? Can we learn complex networks from regular data that tell us about the nature of the dependency among the underlying variables (for example, what variables are the causes of a given variable)? How well do these often complex models fit the real data? Advancing on these questions has a direct impact on many scientific domains dealing with data. For example, genomics and computational biology, neuroscience, epidemiology, network security, social sciences and marketing, all benefit from advances in network analysis. Advances in dependency structure learning can improve causal inference procedures with impact on all scientific fields. This project on network epidemics has the potential to be transformative with immediate applications to the public health domain.This project advances the state-of-the art in inferring complex relations from data in an unsupervised fashion. As a result, network inference and graphical modeling will play prominent roles in our approach. We will consider four main tasks: 1) Developing goodness-of-fit tests for structured network models, in particular those used in community detection and clustering. Despite advances in network modeling, there are concerns that current models are not capturing the complexity of real networks. A first step toward realistic network modeling is developing tools for assessing how well the models fit. 2) Advancing the state-of-the-art in modeling complex networks, presenting ideas on capturing self-similarity in real networks as well as hierarchical statistical models for multilayer networks. 3) Advancing inference based on network dynamics: Many networks are accompanied by dynamics governed by the network structure, e.g., the spread of rumors and diseases. We often observe the result of the dynamics (who gets infected over time) and would like to make inference about the origin of the dynamic or the structure of the underlying network. We will address the challenges in dealing with these questions in real networks where the presence of many cycles and incomplete information about the dynamic pose serious difficulties. 4) Advancing inference of high-dimensional dependency structures: Characterizing dependencies (correlation, causation, etc.) among a collection of random variables is a fundamental task of statistical analysis. The principal investigator will explore learning high-dimensional directed graphical models from data that are suitable for causal interpretations.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

项目成果

期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
On perfectness in Gaussian graphical models
论高斯图模型的完美性
Hierarchical Stochastic Block Model for Community Detection in Multiplex Networks
  • DOI:
    10.1214/22-ba1355
  • 发表时间:
    2019-03
  • 期刊:
  • 影响因子:
    0
  • 作者:
    M. Paez;A. Amini;Lizhen Lin
  • 通讯作者:
    M. Paez;A. Amini;Lizhen Lin
Label consistency in overfitted generalized $k$-means
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Linfan Zhang;A. Amini
  • 通讯作者:
    Linfan Zhang;A. Amini
The Potts-Ising model for discrete multivariate data
  • DOI:
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zahra S. Razaee;A. Amini
  • 通讯作者:
    Zahra S. Razaee;A. Amini
Statistical Guarantees for Consensus Clustering
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zhixin Zhou;Gautam Dudeja;A. Amini
  • 通讯作者:
    Zhixin Zhou;Gautam Dudeja;A. Amini
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Arash Amini其他文献

Stationary Processes on Directed Graphs
有向图上的平稳过程
Two non-convex optimization approaches for joint transmit waveform and receive filter design
用于联合发射波形和接收滤波器设计的两种非凸优化方法
  • DOI:
    10.1016/j.sigpro.2025.109952
  • 发表时间:
    2025-08-01
  • 期刊:
  • 影响因子:
    3.600
  • 作者:
    Mohammad Mahdi Omati;Seyed Mohammad Karbasi;Arash Amini
  • 通讯作者:
    Arash Amini
Fast High-Quality Directed Graph Learning
快速高质量有向图学习
Performance evaluation of automotive dealerships using grouped mixture of regressions
使用分组混合回归对汽车经销商进行绩效评估
Case Studies in Nondestructive Testing and Evaluation
无损检测与评估案例研究
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Arash Amini;M. Entezami;M. Papaelias
  • 通讯作者:
    M. Papaelias

Arash Amini的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    合作创新研究团队

相似海外基金

CAREER: Next-Generation Methods for Statistical Integration of High-Dimensional Disparate Data Sources
职业:高维不同数据源统计集成的下一代方法
  • 批准号:
    2422478
  • 财政年份:
    2024
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Practical algorithms and high dimensional statistical methods for multimodal haplotype modelling
职业:多模态单倍型建模的实用算法和高维统计方法
  • 批准号:
    2239870
  • 财政年份:
    2023
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
CAREER: Towards Tight Guarantees of Markov Chain Sampling Algorithms in High Dimensional Statistical Inference
职业:高维统计推断中马尔可夫链采样算法的严格保证
  • 批准号:
    2237322
  • 财政年份:
    2023
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Computer-Intensive Statistical Inference on High-Dimensional and Massive Data: From Theoretical Foundations to Practical Computations
职业:高维海量数据的计算机密集统计推断:从理论基础到实际计算
  • 批准号:
    2347760
  • 财政年份:
    2023
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Next-Generation Methods for Statistical Integration of High-Dimensional Disparate Data Sources
职业:高维不同数据源统计集成的下一代方法
  • 批准号:
    2044823
  • 财政年份:
    2021
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Robust Causal And Statistical Inference In High Dimensional Structured Systems With Hidden Variables
职业:具有隐藏变量的高维结构化系统中的稳健因果和统计推断
  • 批准号:
    1942239
  • 财政年份:
    2020
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Valid and Scalable Inference for High-dimensional Statistical Models
职业:高维统计模型的有效且可扩展的推理
  • 批准号:
    1844481
  • 财政年份:
    2019
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Computer-Intensive Statistical Inference on High-Dimensional and Massive Data: From Theoretical Foundations to Practical Computations
职业:高维海量数据的计算机密集统计推断:从理论基础到实际计算
  • 批准号:
    1752614
  • 财政年份:
    2018
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Scaling Up Knowledge Discovery in High-Dimensional Data Via Nonconvex Statistical Optimization
职业:通过非凸统计优化扩大高维数据中的知识发现
  • 批准号:
    1906169
  • 财政年份:
    2018
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Scaling Up Knowledge Discovery in High-Dimensional Data Via Nonconvex Statistical Optimization
职业:通过非凸统计优化扩大高维数据中的知识发现
  • 批准号:
    1652539
  • 财政年份:
    2017
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了