权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Nonparametric Confidence Sequences and their Applications

非参数置信序列及其应用

基本信息

批准号：
1916320
负责人：
Aaditya Ramdas
金额：
$ 16万
依托单位：
Carnegie-Mellon University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-08-01 至 2022-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1916320&HistoricalAwards=false
关键词：
Nonparametric Confidence Sequences their Applications

项目摘要

Large scale sequential testing and estimation is now a daily task in the tech industry, with large internet companies running hundreds or thousands of experiments (sometimes called A/B tests) per week to understand their customer base preferences, in order to improve product performance and user experience. Such experiments are inherently sequential: visitors arrive in a stream and outcomes are typically observed quickly relative to the duration of the test. These experiments are loosely planned: there are few hurdles to starting a new test, and little oversight on how they are run, unlike clinical trials which are heavily regulated with clear formal planning due to the involvement of statisticians from the start. The experiments are also continuously monitored, and adaptive choices are made of whether to stop early and make conclusions, or to collect more data. In other words, sample sizes and budgets are rarely fixed in stone in advance and there is plenty of flexibility at the hands of the data scientist who is running the experiment. Such situations are common in the sciences as well, with either telescopes collecting astronomical data sequentially (and perhaps testing for presence of black holes or estimating sizes of galaxies), or psychologists collecting human subject data sequentially (and analyzing effect sizes along the way). However, a major drawback of such flexible, loosely planned, sequential experimentation with fluid decision making, is that it is very nontrivial to provide correct inferential guarantees, either along the way or when the experiment is terminated. Traditional confidence intervals and p-values, the bread and butter of classical statistics, are designed for fixed sample sizes, and can only be used once at that predetermined time. Using the standard CIs repeatedly at different sample sizes, or after adaptively stopping, without any correction to account for the multiple intervals constructed, completely invalidates their guarantees, leading to an increase in erroneous conclusions. The graduate student support will be used for research on sequential analysis and concentration inequalities.The PI proposes to revisit a classical notion called a "confidence sequence" by Darling and Robbins (1967), which is a (potentially infinite) sequence of confidence intervals that is, with high probability, simultaneously valid over all times. Due to the simultaneous guarantee, an analyst may keep peeking at the data and the constructed confidence sequence, adaptively choosing to stop collecting data or to collect more, and still have correct inferential guarantees through the process including when it stops. These can be converted to always-valid p-values, that are also valid at arbitrary stopping times. Using modern martingale techniques, we have recently been able to generalize prior constructions of confidence sequences to several novel nonparametric settings, yielding both the tightest known closed-form CS expressions as well as the sharpest numerical methods in practice. This project seeks to extend the scope of the above advances both theoretically and practically. A few examples of extensions that the PI wishes to pursue include designing new confidence sequences for vector-valued mean vectors, and fully empirical bounds that do not depend on unknown parameters. We will also explore applications of these bounds to sequential testing and estimation tasks.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

大规模的顺序测试和评估现在是科技行业的日常任务，大型互联网公司每周都会进行数百或数千次实验（有时称为A/B测试），以了解他们的客户群偏好，从而提高产品性能和用户体验。这样的实验本质上是连续的：访问者在一个流中到达，结果通常相对于测试的持续时间很快被观察到。这些实验计划松散：开始一项新的测试几乎没有什么障碍，对它们如何运行的监督也很少，不像临床试验，由于统计学家从一开始就参与其中，因此受到严格的监管，有明确的正式计划。实验也会被持续监控，并做出适应性的选择，是提前停止并得出结论，还是收集更多的数据。换句话说，样本大小和预算很少是事先固定的，运行实验的数据科学家有很大的灵活性。这种情况在科学领域也很常见，要么是望远镜按顺序收集天文数据（也许是测试黑洞的存在或估计星系的大小），要么是心理学家按顺序收集人类受试者的数据（并在此过程中沿着分析效应大小）。然而，这种灵活的，松散的计划，连续的实验与流体决策的一个主要缺点是，它是非常重要的，以提供正确的推理保证，无论是沿着或当实验结束。传统的置信区间和p值，经典统计学的面包和黄油，是为固定的样本量设计的，并且只能在预定的时间使用一次。在不同的样本量下重复使用标准CI，或在自适应停止后，不进行任何校正以解释构建的多个区间，完全使其保证无效，导致错误结论增加。研究生支持将用于序列分析和浓度不等式的研究。PI建议重新审视Darling和Robbins（1967）称为“置信序列”的经典概念，这是一个（潜在无限）置信区间序列，具有高概率，在所有时间内同时有效。由于同时保证，分析人员可以继续窥视数据和构造的置信序列，自适应地选择停止收集数据或收集更多数据，并且在整个过程中（包括停止时）仍然具有正确的推理保证。这些可以转换为始终有效的p值，在任意停止时间也有效。使用现代鞅技术，我们最近已经能够推广到几个新的非参数设置的置信序列的先验结构，产生最紧密的已知的封闭形式的CS表达式，以及在实践中最尖锐的数值方法。本项目旨在从理论和实践两方面扩大上述进展的范围。PI希望追求的扩展的几个例子包括为向量值均值向量设计新的置信序列，以及不依赖于未知参数的完全经验边界。我们还将探索这些界限在顺序测试和评估任务中的应用。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。