权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Mechanics-Based Algorithms for Sampling, Control, and Learning in Non-Convex Domains

基于力学的非凸域采样、控制和学习算法

基本信息

批准号：
2122856
负责人：
Andrew Lamperski
金额：
$ 33.53万
依托单位：
University of Minnesota-Twin Cities
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-09-01 至 2024-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2122856&HistoricalAwards=false
关键词：
Mechanics Based Algorithms Sampling Control

项目摘要

This grant will fund research that enables smart devices to adapt automatically to novel situations, with reliable guarantees of safe and efficient behavior, thereby promoting the progress of science and advancing the national prosperity and health. In the near future, a large portion of the population will rely on devices such as self-driving cars and smart medical implants that make safety-critical decisions without human intervention. In these devices, it is not possible for engineers to manually prescribe all the behaviors that will arise during operation. A self-driving car must steer reliably in unfamiliar road conditions. A neural stimulator for seizure suppression must be personalized to the individual patient. To enable deployment of highly autonomous smart devices on a large scale, these devices must be able to learn appropriate behaviors by themselves. Currently, most learning algorithms for real-world automated systems lack provable guarantees for safety and performance. The methods devised in this project will overcome this limitation, benefitting applications in transportation, healthcare, and home automation. The development of remotely controllable physical experiments will help make principles of control and automated learning accessible to high school and undergraduate students.This research aims to make fundamental contributions to the development of a model-based reinforcement learning methodology that guarantees stability and near-optimal performance for a wide class of unknown nonlinear stochastic systems. It achieves this outcome by addressing two unresolved challenges for existing learning methods: 1) a lack of provable guarantees for convergence to desired probability distributions and to global optima of the corresponding non-convex optimization problems, and 2) the lack of stability guarantees or need for initial stabilizing controllers. The research will leverage new insights on non-smooth stochastic processes to quantitatively bound convergence of solutions around global optima for a collection of algorithms derived from mechanics. Stabilizing controllers for nonlinear stochastic systems will be obtained by a novel variation on the policy iteration method, without requiring an initial stabilizing controller. The work will contribute to a rigorous understanding of algorithms for sampling, optimization, and learning for non-convex losses in non-convex domains, as well as methods of control policy evaluation, stability verification, and optimization.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

这笔赠款将资助使智能设备能够自动适应新情况的研究，并可靠地保证安全和有效的行为，从而促进科学进步，促进国家繁荣和健康。在不久的将来，很大一部分人口将依赖于自动驾驶汽车和智能医疗植入物等设备，这些设备可以在没有人为干预的情况下做出安全关键决策。在这些设备中，工程师不可能手动规定操作期间将出现的所有行为。自动驾驶汽车必须在不熟悉的道路条件下可靠地转向。用于抑制癫痫发作的神经刺激器必须针对患者个体进行个性化。为了能够大规模部署高度自主的智能设备，这些设备必须能够自己学习适当的行为。目前，大多数用于现实世界自动化系统的学习算法缺乏安全性和性能的可证明保证。该项目中设计的方法将克服这一限制，使交通、医疗保健和家庭自动化领域的应用受益。远程可控的物理实验的发展将有助于使控制和自动学习的原则访问高中和本科生，本研究的目的是作出根本性的贡献，发展基于模型的强化学习方法，保证稳定性和近最优性能的广泛的一类未知的非线性随机系统。它通过解决现有学习方法的两个未解决的挑战来实现这一结果：1）缺乏收敛到所需概率分布和相应非凸优化问题的全局最优值的可证明保证，以及2）缺乏稳定性保证或需要初始稳定控制器。该研究将利用对非光滑随机过程的新见解，为来自力学的一系列算法的全局最优解定量约束收敛。针对非线性随机系统的镇定控制问题，提出了一种新的策略迭代方法，不需要初始镇定控制器。这项工作将有助于严格理解非凸域中非凸损失的采样、优化和学习算法，以及控制策略评估、稳定性验证和优化方法。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。