权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Advancing Theory and Computation in Statistical Learning Problems

推进统计学习问题的理论和计算

基本信息

批准号：
1309174
负责人：
Ryan Tibshirani
金额：
$ 15万
依托单位：
Carnegie-Mellon University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2013
资助国家：
美国
起止时间：
2013-07-01 至 2017-06-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1309174&HistoricalAwards=false
关键词：
Advancing Theory Computation Statistical Learning

项目摘要

This research is composed of four related statistical learning projects. The first two projects are theoretical. In the first, the investigator will study of degrees of freedom (i.e., the effective number of parameters) of adaptive modeling techniques. It has been shown that variable selection procedures based on the L1 norm, such as the lasso, exhibit control over their effective number of parameters, since adaptivity here is counterbalanced by shrinkage in coefficient estimation. This project instead considers adaptive procedures that do not employ shrinkage, such as best subset selection, in which the effective number of parameters is (comparatively) greatly inflated. In the second project, the investigator will examine trend filtering, a recently proposed nonparametric regression estimator fit by penalizing the L1 norm of discrete derivatives. Trend filtering estimates can be computed efficiently (e.g., using the work of the third project), but their theoretical properties are not well-understood. The goal is to study the rate of convergence of trend filtering estimates over broad function classes, and make detailed comparisons to existing nonparametric regression estimators (such as smoothing splines, locally adaptive regression splines, etc.). The last two projects are computational. The third project is focused on efficient computations for the generalized lasso path algorithm. The generalized lasso is an estimator that encourages specific structural properties, as opposed to pure sparsity itself, using the L1 norm; one such example is the trend filtering estimator mentioned above. The fourth and final project is an extension of the idea behind stagewise regression to general convex regularization problems. Forward stagewise regression is a simple, scalable algorithm whose estimates can be seen as an approximation to the lasso regularization path. The stagewise extension to general problems produces efficient approximation algorithms for the group lasso, matrix completion, and more; approximation guarantees are unknown and will be studied. Statistical modeling, estimation, and inference are becoming integral aspects of problems in many scientific disciplines. As a result, the field of statistical learning---which broadly encapsulates these three statistical tasks---has witnessed a recent explosion of research. Arguably, current research in this field focuses on creating new methods or extending methods to new domains, and much less so on understanding existing methods. Instead, the investigator will pursue four projects aimed at (i) deepening our understanding of a few well-known (but not as well-understood) statistical learning techniques, and (ii) developing algorithms so that we can employ these techniques efficiently at a larger scale, and hence evaluate their performance. Code for such algorithms will be made freely available through open-source software. Potential applications of this work include the forecasting of medical diagnoses, the modeling of brain signals in neuroscience, and the development of recommender systems.

本研究由四个相关的统计学习项目组成。前两个项目是理论性的。首先，研究者将研究自由度（即，参数的有效数量）的自适应建模技术。它已被证明，变量选择程序的基础上的L1范数，如套索，表现出控制其有效数量的参数，因为自适应在这里是抵消收缩系数估计。该项目考虑的是不使用收缩的自适应程序，例如最佳子集选择，其中有效参数的数量（相对）大大膨胀。在第二个项目中，研究人员将检查趋势过滤，最近提出的非参数回归估计适合惩罚离散导数的L1范数。可以有效地计算趋势过滤估计（例如，使用第三个项目的工作），但它们的理论性质还没有得到很好的理解。目标是研究趋势滤波估计在广泛函数类上的收敛速度，并与现有的非参数回归估计（如平滑样条，局部自适应回归样条等）进行详细比较。最后两个项目是计算性的。第三个项目的重点是有效的计算广义套索路径算法。广义lasso是一种使用L1范数鼓励特定结构属性的估计量，而不是纯粹的稀疏性本身;一个这样的例子是上面提到的趋势过滤估计量。第四个也是最后一个项目是将逐步回归的思想扩展到一般的凸正则化问题。前向逐步回归是一个简单的，可扩展的算法，其估计可以被看作是对套索正则化路径的近似。逐步扩展到一般问题产生了有效的近似算法，用于组套索，矩阵完成等;近似保证是未知的，将被研究。统计建模、估计和推断正在成为许多科学学科中问题的组成部分。因此，统计学习领域-广泛地概括了这三个统计任务-最近见证了研究的爆炸式增长。可以说，目前在这一领域的研究重点是创建新的方法或扩展方法到新的领域，而不是理解现有的方法。相反，研究人员将进行四个项目，旨在（i）加深我们对一些众所周知（但不是很好理解）的统计学习技术的理解，以及（ii）开发算法，以便我们可以在更大范围内有效地使用这些技术，从而评估它们的性能。这些算法的代码将通过开放源码软件免费提供。这项工作的潜在应用包括医疗诊断的预测，神经科学中大脑信号的建模以及推荐系统的开发。