权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: Small: Stochastic Planning and Probabilistic Inference for Factored State and Action Spaces

RI：小：因子状态和行动空间的随机规划和概率推理

基本信息

批准号：
1616280
负责人：
Roni Khardon
金额：
$ 44.71万
依托单位：
Tufts University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2016
资助国家：
美国
起止时间：
2016-06-01 至 2020-01-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1616280&HistoricalAwards=false
关键词：
RI Small Stochastic Planning Probabilistic

项目摘要

Many important problems require control of multiple actuators, or agents, in parallel, to achieve a common coordinated goal in a stochastic environment. Examples of such problems include scheduling in a building with multiple elevators, managing a team for fire and rescue operations, managing the inventory of a large company, controlling a robotic soccer team, and controlling a robotic team to manage shelving and orders in a warehouse environment. These problems naturally fit into a formulation as discrete-time central-control problems where we design an algorithm that decides what action each agent takes at any time step in order to optimize the common objective. The corresponding computational problem, known as stochastic planning, is challenging due its sheer size. In particular, the number of possible states (for example, possible positions of robots, shelves and merchandise in a warehouse) and the number of possible joint actions (combinations of actions of individual robots) are huge in any problem instance of interest. State of the art approaches typically fail due to requiring too much time to properly search for a good policy or due to requiring too much memory to store intermediate values. By viewing stochastic planning through the lens of probabilistic inference, this project proposes several novel domain independent algorithmic approaches that take advantage of problem structure to calculate approximate solutions effectively under time constraints. The project funds are largely devoted to support training and research of PhD students therefore directly support human development in an important high impact area for the nation. More concretely, we propose three competing approaches to solving such problems, all taking insight from formulating the finite horizon control problem as probabilistic inference in a corresponding graphical model, also known as a dynamic Bayesian network. The first approach uses the idea of Monte Carlo search, but adds a strong symbolic component by introducing aggregate trajectories. Aggregate trajectories are obtained by simulating a compositional symbolic model under independence assumptions over the random variables. Each aggregate trajectory provides a value estimate that is approximate but can replace numerous individual trajectories. In this way we get fast approximation of values and effective control under time constraints. The second approach uses problem structure to translate the inference problem into an integer linear program, where the objective and quality of the solution can be traded-off for speed through problem decomposition. A novel construction shows how to sidestep the exponential complexity of the problem and obtain a sequence of integer programs that are both small and decomposable so as to yield effective control under time constraints. The third approach, or more accurately framework, builds on the tight connection between stochastic planning and probabilistic inference in the corresponding dynamic Bayesian network. We show that variants of the first two approaches can be viewed in this light, and through this we propose new inference algorithms for solving the stochastic planning problem. In addition, based on this analysis, we propose new algorithms for probabilistic inference, and new generalized inference questions that go beyond current research on marginal map in graphical models.

许多重要的问题需要控制多个执行器，或代理，并行，以实现一个共同的协调目标，在随机环境中。这样的问题的例子包括在具有多个电梯的建筑物中调度、管理用于消防和救援操作的团队、管理大公司的库存、控制机器人足球队以及控制机器人团队来管理仓库环境中的货架和订单。这些问题自然适合作为离散时间中央控制问题的公式，我们设计了一个算法，决定每个代理在任何时间步采取什么行动，以优化共同的目标。相应的计算问题，称为随机规划，是具有挑战性的，由于其庞大的规模。特别是，在任何感兴趣的问题实例中，可能的状态（例如，仓库中机器人、货架和商品的可能位置）和可能的联合动作（单个机器人的动作组合）的数量都是巨大的。现有技术的方法通常由于需要太多的时间来正确地搜索好的策略或者由于需要太多的存储器来存储中间值而失败。透过机率推论的透镜观察随机规划，本计画提出几种新颖的领域独立演算法，利用问题结构在时间限制下有效地计算近似解。项目资金主要用于支持博士生的培训和研究，因此直接支持国家重要的高影响力领域的人类发展。更具体地说，我们提出了三种相互竞争的方法来解决这些问题，所有这些方法都是从将有限时域控制问题制定为相应图形模型（也称为动态贝叶斯网络）中的概率推理中获得的。第一种方法使用蒙特卡罗搜索的思想，但通过引入聚合轨迹增加了一个强大的符号组件。聚集轨迹是通过模拟一个组成的符号模型下的独立性假设的随机变量。每个聚合轨迹提供近似的值估计，但可以替代许多单独的轨迹。通过这种方法，我们得到了快速的近似值和有效的控制下的时间约束。第二种方法使用问题结构将推理问题转化为整数线性规划，其中通过问题分解可以权衡解决方案的目标和质量以获得速度。一个新的建设表明如何回避的指数复杂性的问题，并获得一个序列的整数规划，都是小的和可分解的，以便产生有效的控制下的时间限制。第三种方法，或更准确地说，框架，建立在相应的动态贝叶斯网络的随机规划和概率推理之间的紧密联系。我们表明，前两种方法的变体可以从这个角度来看，并通过这一点，我们提出了新的推理算法来解决随机规划问题。此外，基于这种分析，我们提出了新的概率推理算法，和新的广义推理问题，超越了目前的边缘图在图形模型的研究。