权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

A dynamic interactive account of human visual action understanding

人类视觉动作理解的动态交互描述

基本信息

批准号：
ES/X005534/1
负责人：
Paul Downing
金额：
$ 47.92万
依托单位：
Bangor University
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2023
资助国家：
英国
起止时间：
2023 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=ES%2FX005534%2F1
关键词：
dynamic interactive account human visual

项目摘要

Daily life is filled with encounters with other people. Normally, we quickly and effortlessly understand the meaning of the actions they perform -- a remarkable human capacity. Doing so is key to normal social life, because those actions provide vital clues about others' intentions, beliefs, and personalities. For example, on seeing a family member chopping vegetables in the kitchen, we know that he intends to cook a meal; seeing a friend opening an umbrella suggests that she believes it will soon rain; and observing a stranger make a donation at a shop entrance indicates that she may be an empathetic person. How action understanding is so readily achieved remains poorly understood.Our project offers a novel view of human action understanding as arising from interactions of two mental processes. Perceptual systems gather evidence about the actions we see, extracting the objects, movements, body postures, and scene context that make up an action. Returning to the previous cooking example, these systems would locate and identify the knife, cutting board, vegetables, and other objects; compute the posture of the cook, his grasp of the knife, and its up-and-down movements; and describe the layout of the scene and identify it as a kitchen. Evidence from these perceptual systems interacts with a mental library of "action frames", each of which captures the typical roles, relationships, and reasons that comprise an action. For example, an action frame for "cooking" captures our knowledge that this generally involves the manipulation of food ingredients, with certain tools and movements, with the goal to transform them into an edible result, all of which typically takes place in a kitchen setting. Action frames also express some of our (normally unconscious) knowledge about probabilities related to actions. For example, we know that chopping motions are more likely to occur with a knife than a spoon; stirring often occurs in cooking but also in painting; and the kinds of actions that occur in a kitchen tend not to overlap with those typically seen in a garage. Action understanding arises when the activity of the perceptual systems and the action frames converges on a consistent interpretation, in which the key roles of the action frame are filled, and competing, less-likely action frames are excluded. We plan to test this framework in two ways. First, we have designed simple action-related tasks that will require judgments from human adult volunteers - such as to notice whether two actions that are shown one after the other are the same or not; or to judge whether a written label is the right or wrong one to describe an action picture. These tests are grouped under three broad themes. In brief, they examine: 1) the impact of degraded perceptual information on action understanding; 2) how expectations affect the efficiency of action understanding; and 3) how action frames "fill in" aspects of actions that we don't actually see. A fourth cross-cutting theme assesses how mental "load" (e.g. visual distractions or juggling multiple mental tasks) impacts action understanding. Our second approach is to model each of these tasks in detail with simple but powerful "neural network" computer models. These allow us to frame our predictions in a precise, quantitative way, and to make new predictions about how action understanding behaviour will unfold. With this combined approach, we hope to demonstrate how our framework explains at least some of the human ability to understand others' actions efficiently. We see potential for this framework to inform future research in child development, group dynamics, social learning, artificial vision or other disciplines with a stake in how human observers understand the meaning and learning opportunities behind others' actions. We propose to assemble an international Consortium of interested researchers from these and related disciplines, to accelerate those potential impacts.

日常生活充满了与他人的相遇。通常情况下，我们很快就能毫不费力地理解它们所做动作的意义--这是一种非凡的人类能力。这样做是正常社交生活的关键，因为这些行为提供了关于他人意图，信仰和个性的重要线索。例如，看到一个家庭成员在厨房切菜，我们知道他打算做饭;看到一个朋友打开雨伞，表明她相信很快就会下雨;看到一个陌生人在商店门口捐款，表明她可能是一个有同情心的人。动作理解是如何如此容易地实现的仍然知之甚少。我们的项目提供了一个新的观点，人类的动作理解产生于两个心理过程的相互作用。感知系统收集我们看到的动作的证据，提取构成动作的物体、动作、身体姿势和场景背景。回到前面的烹饪例子，这些系统将定位和识别刀，砧板，蔬菜和其他物体;计算厨师的姿势，他对刀的把握，以及它的上下运动;描述场景的布局并将其识别为厨房。来自这些感知系统的证据与“行动框架”的心理图书馆相互作用，每个框架都捕捉到了构成行动的典型角色，关系和原因。例如，“烹饪”的动作框架捕获了我们的知识，即这通常涉及使用某些工具和动作来操纵食物成分，目的是将它们转化为可食用的结果，所有这些通常都发生在厨房环境中。动作框架还表达了我们（通常是无意识的）对与动作相关的概率的一些知识。例如，我们知道切菜的动作更可能发生在刀上，而不是勺子上;搅拌经常发生在烹饪中，也经常发生在绘画中;厨房里发生的各种动作往往与车库里常见的动作不重叠。当感知系统和动作框架的活动收敛于一个一致的解释时，动作理解就产生了，在这个解释中，动作框架的关键角色被填补，而竞争的、不太可能的动作框架被排除在外。我们计划以两种方式测试这个框架。首先，我们设计了一些简单的与动作相关的任务，需要成年志愿者进行判断，比如注意两个相继出现的动作是否相同，或者判断一个书面标签对动作图片的描述是正确的还是错误的。这些测试分为三大主题。简而言之，他们审查：1）退化的感知信息对动作理解的影响; 2）期望如何影响动作理解的效率; 3）动作框架如何“填充”我们实际上看不到的动作方面。第四个交叉主题评估心理“负荷”（例如视觉干扰或应付多个心理任务）如何影响动作理解。我们的第二种方法是用简单但功能强大的“神经网络”计算机模型对这些任务进行详细建模。这使我们能够以一种精确的、定量的方式来构建我们的预测，并对行动理解行为将如何展开做出新的预测。通过这种结合的方法，我们希望展示我们的框架如何解释人类有效理解他人行为的能力。我们看到这个框架有可能为未来的儿童发展、群体动力学、社会学习、人工视觉或其他学科的研究提供信息，这些学科与人类观察者如何理解他人行为背后的意义和学习机会有关。我们建议组建一个国际联盟，由来自这些学科和相关学科的感兴趣的研究人员组成，以加速这些潜在的影响。