权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CIF: Small: Self-Adaptive Optimization Algorithms with Fast Convergence via Geometry-Adapted Hyper-Parameter Scheduling

CIF：小型：通过几何自适应超参数调度实现快速收敛的自适应优化算法

基本信息

批准号：
2106216
负责人：
Yi Zhou
金额：
$ 41.12万
依托单位：
University of Utah
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-07-01 至 2024-06-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2106216&HistoricalAwards=false
关键词：
CIF Small Self Adaptive Optimization

项目摘要

Machine-learning and artificial-intelligence techniques have been widely applied in modern society to enhance quality of lifr. In these applications, machine-learning models such as neural networks are trained on a large dataset using various optimization algorithms, which iteratively adjust the model parameters and converge to a good model. In particular, the convergence of these optimization algorithms often relies on choosing a good set of hyper-parameters. For example, one important algorithm hyper-parameter is the step size, which controls the scale of the update applied to the model parameters in every iteration, and it must be carefully chosen to avoid slow convergence and possible divergence. In practice, these algorithm hyper-parameters either are guided by optimization theory or are set through manual fine-tuning. While theory-guided algorithm hyper-parameters often rely on certain unknown geometrical information of the model and are often too conservative, resulting in result in slow convergence, manually fine-tuned algorithm hyper-parameters critically depend on the specific application and algorithm, and often introduce much computation overhead. This project aims to address these issues by developing a principled, computation-light and effective hyper-parameter scheduling scheme for different types of optimization algorithms to achieve fast and stable convergence. The developed adapted hyper-parameter scheduling scheme is intended to facilitate machine-learning practitioners tuning the algorithm hyper-parameters and dynamically adapt them to the ongoing optimization process. This has further positive impact on implementation of large-scale machine learning applications such as autonomous driving, training adversary-robust models, robust decision making in finance and control, etc. In this project, the researchers are developing a principled and efficient algorithm hyper-parameter scheduling framework that jointly adapts different algorithm hyper-parameters to the local geometry of the nonconvex objective function for a variety of popular optimization algorithms, and corroborate them with strong theoretical convergence guarantees in nonconvex machine learning. Specifically, the researchers are developing such geometry-adapted hyper-parameter scheduling scheme for deterministic optimization algorithms, including first-order gradient-based algorithms, accelerated gradient algorithms and second-order Newton-type algorithms. The researchers are developing new analysis tools that advance the understanding of the relation between hyper-parameters and the dynamic optimization process. Iteration and computation complexities of these algorithms is being established in nonconvex optimization. Based on this development, the researchers are extending the adapted hyper-parameter scheduling scheme to stochastic optimization algorithms, which use mini-batch random sampling and therefore necessitate a joint scheduling of step-size and batch size. Analysis of sample complexity and high probability convergence guarantee is being established for these algorithms. Furthermore, these developments are guiding the design of adapted hyper-parameter scheduling scheme for gradient-based minimax optimization algorithms.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

在现代社会中，机器学习和人工智能技术已被广泛应用于提高生活质量。在这些应用中，机器学习模型（如神经网络）使用各种优化算法在大型数据集上进行训练，这些算法迭代地调整模型参数并收敛到一个好的模型。特别是，这些优化算法的收敛性往往依赖于选择一组好的超参数。例如，一个重要的算法超参数是步长，它控制每次迭代中应用于模型参数的更新规模，必须仔细选择以避免缓慢收敛和可能的发散。在实践中，这些算法超参数要么由优化理论指导，要么通过手动微调设置。理论指导的算法超参数往往依赖于模型的某些未知几何信息，且往往过于保守，导致收敛速度慢，而人工微调的算法超参数则严重依赖于具体的应用和算法，且往往会引入大量的计算开销。该项目旨在解决这些问题，为不同类型的优化算法开发一个原则性的，计算量小的和有效的超参数调度方案，以实现快速和稳定的收敛。开发的自适应超参数调度方案旨在方便机器学习从业者调整算法超参数，并动态地使其适应正在进行的优化过程。这对大规模机器学习应用的实施产生了进一步的积极影响，例如自动驾驶，训练对抗鲁棒模型，财务和控制中的鲁棒决策等。研究人员正在开发一个有原则的和有效的算法超参数调度框架，该框架联合适应不同的算法超参数，参数的局部几何非凸目标函数的各种流行的优化算法，并证实他们与强有力的理论收敛保证非凸机器学习。具体来说，研究人员正在为确定性优化算法开发这种几何适应的超参数调度方案，包括一阶基于梯度的算法，加速梯度算法和二阶牛顿型算法。研究人员正在开发新的分析工具，以促进对超参数和动态优化过程之间关系的理解。这些算法的迭代和计算复杂性是建立在非凸优化。基于这一发展，研究人员正在将自适应超参数调度方案扩展到随机优化算法，该算法使用小批量随机采样，因此需要步长和批量的联合调度。分析样本的复杂性和高概率收敛保证正在建立这些算法。此外，这些发展正在指导基于梯度的极大极小优化算法的自适应超参数调度方案的设计。该奖项反映了NSF的法定使命，并通过使用基金会的智力价值和更广泛的影响审查标准进行评估，被认为值得支持。