权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Interpreting Human Behaviour in Video using FSA's and Object Context

使用 FSA 和对象上下文解释视频中的人类行为

基本信息

批准号：
0534837
负责人：
David Forsyth
金额：
--
依托单位：
University of Illinois at Urbana-Champaign
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2006
资助国家：
美国
起止时间：
2006-03-01 至 2010-02-28
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0534837&HistoricalAwards=false
关键词：
Interpreting Human Behaviour Video using

项目摘要

PI: David ForsythTitle: Interpreting Human Behavior in Video using FSA's and Object ContextUnderstanding what people are doing in a video is one of the great unsolved problems of computer vision. A fair solution opens tremendous application possibilities. The proposed work will use existing tools from the speech and object recognition community --- in particular, finite state automata or FSA's --- to obtain an understanding of activities that depend on detailed information about the body.The particular focus is everyday activity. In this case, a fixed vocabulary either doesn't exist, or isn't appropriate. For example, one does not know words for behaviors that appear familiar. One way to deal with this is to work with a notation (for example, laban notation); but such notations typically work in terms that are difficult to map to visual observables (for example, the weight of a motion). The alternatives are either to develop a vocabulary, or to develop expressive tools for authoring models.This project will explore the third approach of building tools for authoring models of behavior quickly and expressively using finite-state methods. Research will explore a class of models that are easy to author from existing, or easily available, data. The interpretation of what someone is doing is affected by the objectsnearby --- a person standing near a bus stop is doing something different from a person standing near an office door. The models studied make it practical to investigate this phenomenon of object context, using recent advances from the object recognition literature.Evaluating models for everyday behaviors is hard, because there is no prospect of obtaining a large collection of marked up video (among other things, there isn't a vocabulary in which to mark it up). This project will use proxies --- statistics that are hard to measure from video without accurate inferences of behavior, but easy to measure in other ways --- to evaluate behavior representations. These will make it possible to tell whether, for example, a model of buying a beverage represents the concept accurately.Intellectual merits: This project will produce very large finite state models of behavior using the same hierarchical authoring methods used in speech. There will be a particular emphasis on behaviors which require one to understand the kinematic configuration of the body, a topic that has been very difficult to study to date, with an intention of identifying basic building blocks of a vocabulary of everyday behavior. The results should include datasets of public behavior that can be disseminated, without encountering privacy concerns. New insights into the structure of human motion and behavior should emerge from (a) observations of people in public; (b) the process of authoring models; and (c) methods for identifying and modelling compositional structure in motion.Broader impact: This project should make substantial progress on one of the key open and applicable problems in computer vision. Methods that can search video for particular behaviors and compute statistics of behaviors have a wide range of applications, including human-computer interfaces built around computers that can watch the body; an improved understanding of what people do in public which will result in better architectural planning; more efficient management of surveillance data, allowing searches for dangerous behaviors while preserving privacy. Education and access: This project will contribute to the graduate training of several students, and work described will contribute to a planned text on computing with human motion.URL: http://luthuli.cs.uiuc.edu/~daf/action.html

PI: David forsyth题目：使用FSA和对象上下文解释视频中的人类行为理解人们在视频中做什么是计算机视觉尚未解决的重大问题之一。一个公平的解决方案开启了巨大的应用可能性。拟议的工作将使用来自语音和对象识别社区的现有工具，特别是有限状态自动机或FSA，以获得对依赖于身体详细信息的活动的理解。特别关注的是日常活动。在这种情况下，固定词汇表要么不存在，要么不合适。例如，人们不知道如何描述熟悉的行为。处理这种情况的一种方法是使用符号（例如，拉班符号）；但是，这种符号通常以难以映射到视觉观察的术语工作（例如，运动的权重）。替代方案要么是开发词汇表，要么是开发用于创作模型的表达工具。本项目将探索第三种方法，即使用有限状态方法快速而富有表现力地构建行为模型的工具。研究将探索一类容易从现有或容易获得的数据中创建的模型。人们对某人正在做什么的解读会受到附近物体的影响——站在公交车站附近的人和站在办公室门口附近的人在做不同的事情。所研究的模型利用对象识别文献的最新进展，使研究对象上下文的这种现象变得可行。评估日常行为的模型是困难的，因为不可能获得大量标记过的视频（除此之外，没有一个词汇表可以标记它）。该项目将使用代理来评估行为表征——如果没有准确的行为推断，很难从视频中衡量统计数据，但很容易以其他方式衡量。这将使我们有可能判断，例如，购买饮料的模型是否准确地代表了概念。智力优势：该项目将使用与语音相同的分层创作方法生成非常大的有限状态行为模型。课程将特别强调需要理解身体运动结构的行为，这是一个迄今为止很难研究的主题，目的是识别日常行为词汇的基本构建块。结果应该包括可以传播的公共行为数据集，而不会遇到隐私问题。对人类运动和行为结构的新见解应该来自(a)对公共场合人们的观察；(b)创建模型的过程；(c)运动中组成结构的识别和建模方法。更广泛的影响：这个项目应该在计算机视觉的一个关键开放和适用的问题上取得实质性的进展。可以搜索视频中的特定行为和计算行为统计的方法有广泛的应用，包括围绕可以观察身体的计算机构建的人机界面；更好地了解人们在公共场所的行为，从而更好地进行建筑规划；更有效地管理监控数据，允许在保护隐私的同时搜索危险行为。教育和获取：该项目将有助于几名学生的研究生培训，所描述的工作将有助于编写关于人体运动计算的计划文本。URL: http://luthuli.cs.uiuc.edu/ daf / action.html