权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Robotic-Specific Machine Learning

机器人专用机器学习

基本信息

批准号：
329426068
负责人：
Professor Dr. Oliver Brock
金额：
--
依托单位：
Fachgebiet Robotics and Biology Laboratory
依托单位国家：
德国
项目类别：
Research Grants
财政年份：
2017
资助国家：
德国
起止时间：
2016-12-31 至 2023-12-31
项目状态：
已结题

来源：
https://gepris.dfg.de/gepris/projekt/329426068?language=en
关键词：
Robotic Specific Machine Learning

项目摘要

This project will develop robotics-specific machine learning methods that enable robots to efficiently learn complex behavior. The requirement for such methods follows directly from the no-free-lunch theorems (Wolpert, 1996) which prove that no machine learning method works better than random guessing when averaged over all possible problems. The only way to improve over random guessing is to restrict the problem space and incorporate prior knowledge about this problem space into the learning method.Of course, there are machine learning methods that apply to a wide range of real world tasks by incorporating fairly general priors, e.g. smoothness. However, even for solving relatively simple problems, such methods already require huge amounts of data and computation. The overall problem of robotics---learning behavior that maps a stream of high-dimensional sensory input to a stream of high-dimensional motor output from sparse feedback---is too complex to be solved by generic machine learning methods using realistic amounts of data and computation. Other approaches avoid this problem by incorporating task-specific prior knowledge, e.g. by engineering features and representations tailored to the robotic task at hand. However, these approaches do not generalize to new tasks.This project proposes a middle ground between general and task-specific approaches to learning robot behavior. The key idea is to incorporate robotics-specific prior knowledge, i.e. priors that are consistent with a wide range of robotic tasks. Applying this idea requires two steps: a) discovering robotics-specific prior knowledge and b) incorporating these priors into machine learning methods. We can discover such priors by looking at the structure inherent in the interactions of robots and the physical world (e.g. physics, embodiment, objectness). To incorporate such priors, we will relate them to internal state representations, which are an intermediate result in the mapping from the robot's sensory input to its motor output. Technically, we will incorporate robotics-specific priors by i) defining appropriate learning objectives and by ii) restricting the hypothesis space. Our work will eliminate the need for task-specific feature engineering while keeping the data and computation requirements at a minimum. As a consequence, this project will enable robots to autonomously learn complex tasks from raw sensory input. We have extensive preliminary work showing the feasibility and the potential of our approach. This project will develop this idea further by:1) Online learning during the interaction2) Solving partially observable tasks by simultaneously learning task-specific state representations and recursive loops to estimate them3) Learning structured state representations that make reinforcement learning more efficient for robotic tasks4) Incrementally learning state representations for multiple related tasks

该项目将开发特定于机器人的机器学习方法，使机器人能够有效地学习复杂的行为。对这种方法的要求直接来自no-free-lunch定理（Wolpert, 1996），该定理证明，当对所有可能的问题进行平均时，没有比随机猜测更好的机器学习方法。改进随机猜测的唯一方法是限制问题空间，并将关于该问题空间的先验知识纳入学习方法中。当然，有一些机器学习方法可以通过结合相当一般的先验（例如平滑）来应用于广泛的现实世界任务。然而，即使是解决相对简单的问题，这些方法也已经需要大量的数据和计算。机器人的整体问题——将高维感官输入流映射到稀疏反馈的高维运动输出流的学习行为——太复杂了，无法通过使用实际数据和计算量的通用机器学习方法来解决。其他方法通过结合任务特定的先验知识来避免这个问题，例如，通过为手头的机器人任务量身定制的工程特征和表示。然而，这些方法并不适用于新的任务。该项目提出了学习机器人行为的一般方法和特定任务方法之间的中间地带。关键思想是结合机器人特定的先验知识，即与广泛的机器人任务一致的先验知识。应用这个想法需要两个步骤：a)发现机器人特定的先验知识，b)将这些先验知识整合到机器学习方法中。我们可以通过观察机器人和物理世界相互作用的内在结构（例如物理、具体化、客观性）来发现这样的先验。为了整合这些先验，我们将它们与内部状态表示联系起来，这是从机器人的感官输入到其电机输出映射的中间结果。从技术上讲，我们将通过i)定义适当的学习目标和ii)限制假设空间来结合机器人特定的先验。我们的工作将消除对特定任务特征工程的需求，同时将数据和计算需求保持在最低限度。因此，该项目将使机器人能够从原始感官输入中自主学习复杂任务。我们已经做了大量的初步工作来证明我们的方法的可行性和潜力。该项目将进一步发展这一想法：1)在交互过程中在线学习2)通过同时学习任务特定的状态表示和递归循环来估计它们来解决部分可观察的任务3)学习结构化状态表示，使强化学习对机器人任务更有效4)增量学习多个相关任务的状态表示