权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Second-order Hessian-free methods for statistical learning and stochastic optimization

用于统计学习和随机优化的二阶无 Hessian 方法

基本信息

批准号：
RGPIN-2022-04400
负责人：
Bastin, Fabian
金额：
$ 3.13万
依托单位：
Université de Montréal
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=757409
关键词：
Second order Hessian free methods

项目摘要

The success of machine learning this last decade has had a deep impact in the mathematical optimization community and renewed interest in methods as stochastic gradient descent. Such an approach has the advantage to provide cheap iterations, allowing fast progress at the beginning of the optimization, and to avoid the storage of dense matrices, prohibited when dealing with a very large number of parameters. They however have difficulties to converge close to the solution, relying to vanishing step sizes to guarantee theoretical convergence. The algorithm can present difficulties to reach a vicinity of solution depending on the starting point. We investigate second-order Hessian-free strategies to capitalize on the existing nonlinear programming theory, while allowing to scale with the number of data and decision variables. The methods rely on adaptive sample average approximations (SAA), controlling the sample size with respect to the achieved estimated objective function reduction when compared to the statistical noise, within a trust-region framework and standard variance reduction techniques. At each iteration, quasi-Newton candidate iterates can be obtained without explicit matrix storage, and we explore how to use the structure of typical estimation problems to improve the approach. A second objective of the proposed research consists in capitalizing on the statistical information that we obtain on the model to develop better early stopping strategies. They are especially important as large samples are required close to the solution, leading to costly iterations. Another benefit is the possibility to provide the modeler with some information about the residual uncertainty at the found solution. We also explore the effect of observations that are not independently and identically distributed, as they could lead to biased solutions, and possibly have a negative impact on some social communities when the model is used to elaborate policies that impact individuals, for instance in transportation or energy. Similarly, model misspecifications are important to analyze, both in terms of algorithm convergence and in terms of solution robustness. Another important aspect that we consider is the feasible set as most of the optimization algorithms used in machine learning are designed for unconstrained problems only. However, many real applications, for instance in energy, include nonlinear constraints whose expressions can depend on the realization of the uncertainty, and the feasible set is not guaranteed to be convex. A standard approach is to turn to methods aiming to find a KKT solution, but stochastic approximation methods have received much less attention in this context, and SAA methods present additional challenges too, as adaptive sampling strategies face more difficulties to exploit the information geometry and the sample can have to be adjusted when it is important to satisfy some constraints for all or nearly all scenarios.

在过去的十年里，机器学习的成功在数学优化领域产生了深刻的影响，并重新引起了人们对随机梯度下降等方法的兴趣。这种方法的优点是提供廉价的迭代，允许在优化开始时快速进行，并避免存储密集矩阵，这在处理非常大量的参数时是被禁止的。然而，它们很难收敛到接近解的地方，依赖于逐渐消失的步长来保证理论上的收敛。根据起始点的不同，该算法可能很难达到解的附近。我们研究了二阶无黑森策略，以利用现有的非线性规划理论，同时允许随着数据和决策变量的数量而变化。该方法依赖于自适应样本平均近似(SAA)，在信任域框架和标准方差减少技术内，相对于所实现的估计目标函数减少来控制样本大小。在每一次迭代中，无需存储显式矩阵，就可以得到拟牛顿候选迭代，并探索如何利用典型估计问题的结构来改进该方法。拟议研究的第二个目标是利用我们在该模型上获得的统计信息来制定更好的提前停止策略。它们尤其重要，因为在接近解决方案时需要大样本，这会导致昂贵的迭代。另一个好处是可以向建模者提供一些关于找到的解决方案的剩余不确定性的信息。我们还探讨了不独立和相同分布的观测的影响，因为它们可能导致有偏见的解决方案，并可能在模型用于制定影响个人的政策时对一些社会社区产生负面影响，例如在交通或能源方面。同样，无论是在算法的收敛方面还是在解的稳健性方面，模型错误规范都是需要分析的重要因素。我们考虑的另一个重要方面是可行集，因为机器学习中使用的大多数优化算法都是针对无约束问题设计的。然而，许多实际应用，例如在能源方面，包括非线性约束，其表达式取决于不确定性的实现，并且可行集并不保证是凸的。一种标准的方法是求助于寻找KKT解的方法，但在这种情况下，随机逼近方法受到的关注要少得多，而SAA方法也带来了额外的挑战，因为自适应采样策略面临着更多的困难来利用信息几何，并且当满足所有或几乎所有场景的某些约束非常重要时，样本可能不得不进行调整。