权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: Small: Learning Fine-Grained Instructions from Uncurated Complex Activity Videos

RI：小型：从未经策划的复杂活动视频中学习细粒度的指令

基本信息

批准号：
2115110
负责人：
Ehsan Elhamifar
金额：
$ 49.77万
依托单位：
Northeastern University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-10-01 至 2024-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2115110&HistoricalAwards=false
关键词：
RI Small Learning Fine Grained

项目摘要

Humans have the remarkable ability of learning to perform complex tasks by watching others performing them and following their instructions. Bringing this capability to machines has far reaching impact on the advancement of the artificial intelligence with important applications, such as designing intelligent assistants and robots that can learn to perform or guide humans through tasks by mining instructional and everyday activity videos. Despite recent advances, there are major challenges facing video and activity understanding methods to convert raw untrimmed long videos of complex activities into detailed and accurate instructions. These include large appearance and motion variations of instructions across videos, high cost of gathering dense temporal video annotations from long videos, lack of a systematic way of integrating different types of available noisy yet inexpensive labels for effective learning and difficulty of generating long-range future instructions. This project investigates a comprehensive mathematical framework for learning detailed and accurate instructions from untrimmed long complex activity videos, overcoming the aforementioned challenges. The research project is accompanied with an integrated education and outreach plan, which involves mentoring high school and undergraduate students through the Northeastern's Young Scholar Program and integrating the results of the project into the undergraduate and graduate classes. The project will publicly release an open-source software implementing the developed algorithms.This project develops new unsupervised and self-supervised task segmentation and subtask (instruction step) localization methods, by investigating a multi-manifold model for tasks and simultaneously learning and finding associations between manifolds across videos while incorporating task constraints and priors. The developed framework allows for handling large appearance and motion variations of subtasks across videos and allows for leveraging other modalities, such as video narrations and audio. The research team will develop a unified weakly-supervised visual grounding framework based on deep neural networks that learns from different types of available inexpensive noisy weak labels, handles subtasks at the distribution tail and generates future instructions from current observations. Furthermore, the team will investigate a new probabilistic deep learning framework with hierarchically connected modules corresponding to subtask, grammar and task prediction, allowing to integrate all types of weak labels and to generate plausible future subtask sequences.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

人类有非凡的能力，通过观察别人执行任务并遵循他们的指示来学习执行复杂的任务。将这种能力带到机器上，对人工智能的发展具有深远的影响，具有重要的应用，例如设计智能助手和机器人，这些机器人可以通过挖掘教学和日常活动视频来学习执行或引导人类完成任务。尽管最近取得了一些进展，但视频和活动理解方法仍面临着重大挑战，即如何将复杂活动的原始、未经修剪的长视频转换为详细而准确的说明。这些问题包括视频中指令的外观和运动变化很大，从长视频中收集密集的时间视频注释的成本很高，缺乏系统地整合不同类型的可用噪声但廉价的标签以进行有效学习，以及难以生成远程未来指令。这个项目研究了一个全面的数学框架，用于从未经修剪的长而复杂的活动视频中学习详细和准确的说明，克服上述挑战。该研究项目伴随着一项综合的教育和推广计划，该计划包括通过东北青年学者计划指导高中生和本科生，并将该项目的结果整合到本科生和研究生班级中。该项目将公开发布实现开发的算法的开源软件。该项目开发了新的无监督和自我监督的任务分割和子任务(指令步骤)定位方法，通过研究任务的多流形模型，同时学习和发现视频中流形之间的关联，同时结合任务约束和先验。开发的框架允许在视频中处理子任务的较大外观和运动变化，并允许利用其他形式，如视频旁白和音频。研究团队将基于深度神经网络开发一个统一的弱监督视觉接地框架，该框架从不同类型的可用的廉价噪声弱标签中学习，在分布尾部处理子任务，并根据当前的观测生成未来的指令。此外，该团队将研究一个新的概率深度学习框架，该框架具有与子任务、语法和任务预测相对应的分层连接模块，允许集成所有类型的弱标签并生成看似合理的未来子任务序列。该奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。