权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Fluidity in simulated human-robot interaction with speech interfaces

模拟人机交互与语音界面的流畅性

基本信息

批准号：
EP/X009343/1
负责人：
Julian Hough
金额：
$ 59.79万
依托单位：
Swansea University
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2023
资助国家：
英国
起止时间：
2023 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FX009343%2F1
关键词：
Fluidity simulated human robot interaction

项目摘要

The need for interactive robots which can collaborate successfully with human beings is becoming important in the UK considering some of the biggest challenges we now face, including the need for high-value manufacturing exports to compete economically internationally, robots which can handle dangerous waste and navigate hazardous environments, and robotics solutions for social care and medical assistance to meet our demographic challenges.A key problem for human-robot interaction (HRI) with speech which limits the wider use of such robots is lack of fluidity. Although there have been significant recent advances in robot vision, motion, manipulation and automatic speech recognition, state-of-the-art HRI is slow, laboured and fragile. The contrast with the speed, fluency and error tolerance of human-human interaction is substantial. The FLUIDITY project will develop technology to monitor, control and increase the interaction fluidity of robots with speech understanding capabilities, such that they become more natural and efficient to interact with. The project will also address the difficulty of developing HRI models due to the time, logistics and cost of working with real-world robots by developing a toolkit for building and testing interactive robot models in a simulated Virtual Reality (VR) environment, making scalable HRI experiments for the wider robotics, HRI and natural language processing (NLP) communities possible.The project focusses on pick-and-place robots which manipulate household objects in view where users will utter commands (e.g. "put the remote control on the table") and issue confirmations and corrections and repairs of the robot's current actions appropriately (e.g. "no, the other table"), allowing rapid, natural responses from both a human confederate teleoperating the robot model and automatic systems. Crucially, appropriate overlap of human speech and robot motion will be permitted to allow more human-like transitions. The project will put interaction fluidity and the rapid recovery from misunderstanding with appropriate repair mechanisms at the heart of interactive robots, which will lead to improved user experience.The means for achieving fluid interaction will firstly be adaptation of Spoken Language Understanding (SLU) algorithms which are not only word-by-word incremental but go beyond that for more human-like real-time measures of confidence the robot has in its interpretation of the user's speech. For the basis of these algorithms, mediated Wizard-of-Oz data will be collected from pairs of human participants, with one participant confederate 'wizard' controlling the robot model and one user. From the visual, audio and motion data collected, SLU algorithms will be built which return the most accurate user intention incrementally word-by-word, but also a continuous measure of confidence corresponding as closely as possible to the reaction times of the human confederate.The project will also address user perception of the robot's intention from the robot's motion by experimenting with different models of motion legibility. The hypothesis is that the more accurately the legibility of the robot's motion can be modelled in real time, the greater the fluidity of interaction possible, as user repairs and confirmations can be interpreted appropriately earlier in the robot's motion.The SLU and legibility algorithms will be integrated in an end-to-end system where interaction fluidity can be controlled, with evaluation in both the VR environment and a comparison to a real-world robot model. The project will provide an abstract theoretical framework for interaction fluidity and practical outcomes of a VR environment, an HRI dataset collected in the environment which will be made publicly available for benchmarking, and software which will be open-source and adaptable for other robot models.

考虑到我们现在面临的一些最大的挑战，在英国，对能够与人类成功合作的交互式机器人的需求变得越来越重要，包括需要高附加值的制造业出口产品来在国际上进行经济竞争，需要能够处理危险废物和导航危险环境的机器人，以及用于社会护理和医疗援助的机器人解决方案，以应对我们的人口挑战。限制此类机器人更广泛使用的人-机器人交互(HRI)的一个关键问题是缺乏流动性。尽管最近在机器人视觉、运动、操纵和自动语音识别方面取得了重大进展，但最先进的HRI仍然缓慢、费力和脆弱。这与人与人互动的速度、流畅性和容错性形成了鲜明的对比。流动性项目将开发技术，以监测、控制和增加具有语音理解能力的机器人的交互流动性，使它们变得更加自然和高效。该项目还将通过开发用于在模拟虚拟现实(VR)环境中构建和测试交互式机器人模型的工具包，为更广泛的机器人学、HRI和自然语言处理(NLP)社区进行可扩展的HRI实验，来解决由于与真实世界机器人合作的时间、物流和成本而导致的开发HRI模型的困难。该项目专注于拾取式机器人，其在用户将发出命令(例如，将遥控器放在桌子上)的情况下操纵家庭物品，并适当地发布确认以及对机器人当前动作的纠正和修复(例如，另一张桌子“)，允许人类邦联成员远程操作机器人模型和自动系统做出快速、自然的反应。至关重要的是，人类语音和机器人运动的适当重叠将被允许实现更多类似于人类的过渡。该项目将交互流畅性和从误解中的快速恢复以及适当的修复机制放在交互机器人的核心位置，这将带来更好的用户体验。实现流畅交互的手段将首先是适应口语理解(SLU)算法，这些算法不仅是逐字递增的，而且超越了更接近人类的实时可信度测量，机器人在解释用户的语音时具有更多的可信度。在这些算法的基础上，将从成对的人类参与者那里收集中介的绿野仙踪数据，由一个参与者邦联成员控制机器人模型和一个用户。从收集的视觉、音频和运动数据中，将建立SLU算法，该算法逐字递增地返回最准确的用户意图，同时也是一种连续的置信度度量，尽可能接近人类联盟的反应时间。该项目还将通过试验不同的运动易读性模型来解决用户从机器人的动作中感知机器人的意图。这个假设是，机器人运动的易读性可以被实时建模得越准确，交互的流动性就越大，因为用户的修复和确认可以在机器人运动的早期得到适当的解释。SLU和易读性算法将被集成到一个端到端系统中，在这个系统中可以控制交互的流动性，并在VR环境中进行评估，并与真实的机器人模型进行比较。该项目将为虚拟现实环境的交互流动性和实际结果提供一个抽象的理论框架，在该环境中收集的HRI数据集将公开供基准使用，以及将开放源代码并适用于其他机器人模型的软件。