BIGDATA: F: Testing High Dimensional Distributions without the Curse of Dimensionality

BIGDATA:F:在没有维数灾难的情况下测试高维分布

基本信息

  • 批准号:
    1741137
  • 负责人:
  • 金额:
    $ 90万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2017
  • 资助国家:
    美国
  • 起止时间:
    2017-12-01 至 2020-11-30
  • 项目状态:
    已结题

项目摘要

Scientists develop descriptions of models that explain their observations. But how many observations are needed to verify the validity of a model? When the model is probabilistic, the resulting question is this: How many samples from a distribution are needed to test whether it has a certain property? Arguably this problem lies at the foundations of scientific thought, and recent years have seen a tremendous body of work in the Computer Science literature trying to close in on the precise sample and time complexity needed to test distribution properties. Often data is high dimensional -- for example, medical records for patients have many entries. However, high dimensional distribution data is notoriously hard to deal with. This project will find new ways of overcoming the difficulties of dealing with high dimensional data, by isolating properties of data occurring in practice that aid in simplifying the distribution testing problems.The broader impact of this project includes advancing the interface of Computer Science, Statistics and Learning. Our methods will be tested on a healthcare dataset, including over 3.7 million patients, and will test the accuracy of common models used to improve healthcare outcomes. Broader impact of this project also includes engagement in Computer Science activities for elementary school children, MIT PRIMES mathematical research with high school students, and participation in activities for promoting women in research. Current work on distribution property testing has focused on properties of single-dimensional distributions such as uniformity, monotonicity, log-concavity, and others, with only a few results on testing properties of high-dimensional distributions. Unfortunately, testing properties of high-dimensional distributions quickly runs into exponential sample complexity lower bounds. The goal of the project is to develop new analysis frameworks for overcoming these lower bounds. Typically the lower bounds construct highly-complex distributions that do not possess a property but are really hard to distinguish from those that do. Our thesis is that such rich structure may not be present in many practical settings of interest. The overarching question of our research then is this: are there reasonable assumptions that one could make about the unknown distribution under which high-dimensional testing problems are more tractable? This research will (1) explore how the expressive language of graphical models can be used to restrict the correlation structure of high-dimensional distributions in ways that can be leveraged for faster testing; and (2) develop analysis frameworks that allow testing generating models of combinatorial structures, such as social networks, from a single or a constant number of samples; this sounds like an oxymoron but it will be made possible with adequate assumptions about the model generating the combinatorial structure. (1) will reveal important connections to Bayesian networks and their use in healthcare decision making, as well as to computational biology and phylogenetics, while (2) will have connections to social network modeling.
科学家开发了解释他们观察到的模型的描述。但是,需要多少观测才能验证模型的有效性呢?当模型是概率模型时,由此产生的问题是:需要来自分布的多少样本来测试它是否具有某种性质?可以说,这个问题存在于科学思想的基础上,近年来,计算机科学文献中有大量工作试图接近测试分布特性所需的精确样本和时间复杂性。数据通常是高维的--例如,患者的医疗记录有许多条目。然而,众所周知,高维分布数据很难处理。这个项目将找到新的方法来克服处理高维数据的困难,通过隔离在实践中出现的有助于简化分布测试问题的数据的属性。该项目的更广泛影响包括推进计算机科学、统计和学习的接口。我们的方法将在包括370多万患者的医疗数据集上进行测试,并将测试用于改善医疗结果的常见模型的准确性。该项目的更广泛影响还包括参与小学生的计算机科学活动,麻省理工学院向高中生推广数学研究,以及参与促进妇女参与研究的活动。目前对分布性质的检验主要集中在一维分布的一致性、单调性、对数凹性等性质上,对高维分布性质的检验结果很少。不幸的是,测试高维分布的性质很快就会遇到指数样本复杂性下限。该项目的目标是开发新的分析框架,以克服这些下限。通常,下界构造了高度复杂的分布,这些分布没有性质,但实际上很难与具有性质的分布区分开来。我们的论点是,如此丰富的结构可能并不存在于许多感兴趣的实际环境中。那么,我们研究的首要问题是:人们是否可以对未知分布做出合理的假设,在这种分布下,高维测试问题更容易处理?这项研究将(1)探索如何使用图形模型的表达语言来限制高维分布的关联结构,以便更快地进行测试;以及(2)开发分析框架,允许测试从单个或固定数量的样本生成组合结构的模型,例如社交网络;这听起来像是矛盾的修饰法,但如果有关于生成组合结构的模型的充分假设,它将成为可能。(1)将揭示与贝叶斯网络及其在医疗决策中的使用,以及与计算生物学和系统发生学的重要联系,而(2)将与社会网络建模有关。

项目成果

期刊论文数量(56)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Approximating the noise sensitivity of a monotone Boolean function
近似单调布尔函数的噪声敏感度
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Rubinfeld, R.;Vasilyan, A.
  • 通讯作者:
    Vasilyan, A.
Resource-Efficient Common Randomness and Secret-Key Schemes
资源高效的通用随机性和密钥方案
On the complexity of modulo-q arguments and the chevalley-warning theorem
关于模 q 参数的复杂性和谢瓦利警告定理
Local Algorithms for Sparse Spanning Graphs
稀疏生成图的局部算法
  • DOI:
    10.1007/s00453-019-00612-6
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    1.1
  • 作者:
    Levi, Reut;Ron, Dana;Rubinfeld, Ronitt
  • 通讯作者:
    Rubinfeld, Ronitt
Towards Testing Monotonicity of Distributions Over General Posets
测试一般偏序分布的单调性
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Ronitt Rubinfeld其他文献

A Self-Tester for Linear Functions over the Integers with an Elementary Proof of Correctness
  • DOI:
    10.1007/s00224-015-9639-z
  • 发表时间:
    2015-06-20
  • 期刊:
  • 影响因子:
    0.400
  • 作者:
    Sheela Devadas;Ronitt Rubinfeld
  • 通讯作者:
    Ronitt Rubinfeld
On the time and space complexity of computation using write-once memory or is pen really much worse than pencil?
  • DOI:
    10.1007/bf02835833
  • 发表时间:
    1992-06-01
  • 期刊:
  • 影响因子:
    0.400
  • 作者:
    Sandy Irani;Moni Naor;Ronitt Rubinfeld
  • 通讯作者:
    Ronitt Rubinfeld
Learning fallible Deterministic Finite Automata
  • DOI:
    10.1007/bf00993409
  • 发表时间:
    1995-02-01
  • 期刊:
  • 影响因子:
    2.900
  • 作者:
    Dana Ron;Ronitt Rubinfeld
  • 通讯作者:
    Ronitt Rubinfeld
Exactly Learning Automata of Small Cover Time
  • DOI:
    10.1023/a:1007348927491
  • 发表时间:
    1997-04-01
  • 期刊:
  • 影响因子:
    2.900
  • 作者:
    Dana Ron;Ronitt Rubinfeld
  • 通讯作者:
    Ronitt Rubinfeld

Ronitt Rubinfeld的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Ronitt Rubinfeld', 18)}}的其他基金

AF: SMALL: Extending the Reach of Distribution Testing via Structure
AF:小:通过结构扩展分布测试的范围
  • 批准号:
    2310818
  • 财政年份:
    2023
  • 资助金额:
    $ 90万
  • 项目类别:
    Standard Grant
AF: Small: Sparsity in Local Computation
AF:小:局部计算的稀疏性
  • 批准号:
    2006664
  • 财政年份:
    2020
  • 资助金额:
    $ 90万
  • 项目类别:
    Standard Grant
AitF: Collaborative Research: Fast, Accurate, and Practical: Adaptive Sublinear Algorithms for Scalable Visualization
AitF:协作研究:快速、准确和实用:用于可扩展可视化的自适应次线性算法
  • 批准号:
    1733808
  • 财政年份:
    2017
  • 资助金额:
    $ 90万
  • 项目类别:
    Standard Grant
EAGER: Testing Pseudorandom Distributions
EAGER:测试伪随机分布
  • 批准号:
    1650733
  • 财政年份:
    2016
  • 资助金额:
    $ 90万
  • 项目类别:
    Standard Grant
AF: Small: New directions in the design of local computation algorithms
AF:小:局部计算算法设计的新方向
  • 批准号:
    1420692
  • 财政年份:
    2014
  • 资助金额:
    $ 90万
  • 项目类别:
    Standard Grant
AF: Small: Local Computation Algorithms
AF:小:本地计算算法
  • 批准号:
    1217423
  • 财政年份:
    2012
  • 资助金额:
    $ 90万
  • 项目类别:
    Standard Grant
AF: Medium: Taming Masssive Data with Sub-Linear Algorithms
AF:中:用次线性算法驯服海量数据
  • 批准号:
    1065125
  • 财政年份:
    2011
  • 资助金额:
    $ 90万
  • 项目类别:
    Standard Grant
MSPA-MCS: Learning to Rank
MSPA-MCS:学习排名
  • 批准号:
    0732334
  • 财政年份:
    2007
  • 资助金额:
    $ 90万
  • 项目类别:
    Standard Grant
The Complexity of Testing Distributions
测试分布的复杂性
  • 批准号:
    0514771
  • 财政年份:
    2005
  • 资助金额:
    $ 90万
  • 项目类别:
    Standard Grant
CAREER: Algorithms for Self-testing/Correcting Program and Learning
职业:自我测试/纠正程序和学习的算法
  • 批准号:
    9624552
  • 财政年份:
    1996
  • 资助金额:
    $ 90万
  • 项目类别:
    Continuing Grant

相似海外基金

Usability testing for analysis of three-dimensional knee kin ematics during gait
步态过程中三维膝关节动力学分析的可用性测试
  • 批准号:
    2883066
  • 财政年份:
    2023
  • 资助金额:
    $ 90万
  • 项目类别:
    Studentship
Advanced two-dimensional braided composite materials for industrial applications using renewable resources: manufacturing, testing and modeling
使用可再生资源用于工业应用的先进二维编织复合材料:制造、测试和建模
  • 批准号:
    RGPIN-2022-03043
  • 财政年份:
    2022
  • 资助金额:
    $ 90万
  • 项目类别:
    Discovery Grants Program - Individual
Foundations of High-Dimensional and Nonparametric Hypothesis Testing
高维和非参数假设检验的基础
  • 批准号:
    2113684
  • 财政年份:
    2021
  • 资助金额:
    $ 90万
  • 项目类别:
    Standard Grant
Improving advanced two-dimensional braided composite materials for industrial applications: manufacturing, testing and modeling
改进工业应用的先进二维编织复合材料:制造、测试和建模
  • 批准号:
    RGPIN-2016-03637
  • 财政年份:
    2021
  • 资助金额:
    $ 90万
  • 项目类别:
    Discovery Grants Program - Individual
DESIGN AND CONSTRUCTION OF AN OMNI-DIRECTIONAL BIOFIDELIC THREE-DIMENSIONAL SURROGATE NECK FOR HELMET AND AUTOMOTIVE CRASH TESTING
用于头盔和汽车碰撞测试的全向生物仿真三维替代颈部的设计和构建
  • 批准号:
    RGPIN-2017-06013
  • 财政年份:
    2021
  • 资助金额:
    $ 90万
  • 项目类别:
    Discovery Grants Program - Individual
A system for multi-dimensional experimental testing of disc-shaped drive train components
盘形传动系统部件多维实验测试系统
  • 批准号:
    459487904
  • 财政年份:
    2021
  • 资助金额:
    $ 90万
  • 项目类别:
    Major Research Instrumentation
Improving advanced two-dimensional braided composite materials for industrial applications: manufacturing, testing and modeling
改进工业应用的先进二维编织复合材料:制造、测试和建模
  • 批准号:
    RGPIN-2016-03637
  • 财政年份:
    2020
  • 资助金额:
    $ 90万
  • 项目类别:
    Discovery Grants Program - Individual
DESIGN AND CONSTRUCTION OF AN OMNI-DIRECTIONAL BIOFIDELIC THREE-DIMENSIONAL SURROGATE NECK FOR HELMET AND AUTOMOTIVE CRASH TESTING
用于头盔和汽车碰撞测试的全向生物仿真三维替代颈部的设计和构建
  • 批准号:
    RGPIN-2017-06013
  • 财政年份:
    2020
  • 资助金额:
    $ 90万
  • 项目类别:
    Discovery Grants Program - Individual
DESIGN AND CONSTRUCTION OF AN OMNI-DIRECTIONAL BIOFIDELIC THREE-DIMENSIONAL SURROGATE NECK FOR HELMET AND AUTOMOTIVE CRASH TESTING
用于头盔和汽车碰撞测试的全向生物仿真三维替代颈部的设计和构建
  • 批准号:
    RGPIN-2017-06013
  • 财政年份:
    2019
  • 资助金额:
    $ 90万
  • 项目类别:
    Discovery Grants Program - Individual
Improving advanced two-dimensional braided composite materials for industrial applications: manufacturing, testing and modeling
改进工业应用的先进二维编织复合材料:制造、测试和建模
  • 批准号:
    RGPIN-2016-03637
  • 财政年份:
    2019
  • 资助金额:
    $ 90万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了