权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Toward Spatial-Temporal Architectures with Deformable and Interpretable Convolutions

职业：走向具有可变形和可解释卷积的时空架构

基本信息

批准号：
1751402
负责人：
Fuxin Li
金额：
$ 51.37万
依托单位：
Oregon State University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2018
资助国家：
美国
起止时间：
2018-04-01 至 2025-03-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1751402&HistoricalAwards=false
关键词：
CAREER Toward Spatial Temporal Architectures

项目摘要

Artificial neural networks have successfully been applied to analyzing visual imagery. The goal of this project is to build a convolutional neural network (CNN) that can scale and deform automatically in order to be able to be invariant to object size and pose.Currently, CNNs cannot even perform well on an image rescaled twice or half as large, if not trained on the re-scaled image. This leads to a lot of redundancies in the model and unnecessary over-complication of the architecture. This project explores approaches to automatically figure out the correct scaling, as well as other transformations, from visual objects in images and videos. The proposed methods will also make convolutional neural networks easier to interpret, and to reduce the amount of data needed to train a network.Besides normal computer vision benchmarks, the research team evaluates the approach with collaborations to apply the technologies to different applications, such as forestry and tumor-cell morphology, The educational goal of this project involves developing a new ?what-you-see-is-what-you-get? (WYSIWYG) deep learning toolbox that enables people without much programming and mathematical skills to utilize deep learning for data analysis. The research team also plans to outreach to high schools and community colleges to introduce more than 100 students to deep learning and visual object recognition. This research develops spatial-temporal CNNs that scale and deform automatically, hence able to concisely represent object recognition models that generalize better under invariant and equivariant transformations unseen in the training set. The project explores novel auto-scaling and multi-deformable convolutional network architectures that utilize parametric motion fields to automatically locate the correct deformations of a visual object for each convolutional filter. In order to learn the motion fields from video, the research team uses a Siamese convolutional-deconvolutional network predicting boundaries in two consecutive frames, and utilizes an output-to-output feedback loop to deduce boundary motion. The research team applies this approach to video segmentation and uses it to generate annotations for a weakly supervised learning of the motion fields. The approach is evaluated on several tasks with limited annotations, such as video segmentation, multi-target tracking and object classification and detection in videos under unseen deformations.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

人工神经网络已成功地应用于视觉图像分析。这个项目的目标是建立一个卷积神经网络（CNN），它可以自动缩放和变形，以便能够保持物体大小和姿态的不变。目前，如果没有在重新缩放的图像上进行训练，cnn甚至不能在重新缩放的图像上表现良好。这将导致模型中的大量冗余和架构的不必要的过度复杂性。该项目探索了从图像和视频中的可视对象自动找出正确缩放以及其他转换的方法。所提出的方法还将使卷积神经网络更容易解释，并减少训练网络所需的数据量。除了常规的计算机视觉基准之外，研究小组还通过合作评估了将该技术应用于不同应用的方法，例如林业和肿瘤细胞形态学。该项目的教育目标包括开发一种新的“所见即所得”技术。（WYSIWYG）深度学习工具箱，使没有太多编程和数学技能的人能够利用深度学习进行数据分析。研究小组还计划向高中和社区大学推广，向100多名学生介绍深度学习和视觉物体识别。该研究开发了自动缩放和变形的时空cnn，从而能够简洁地表示在训练集中不可见的不变和等变变换下更好地泛化的目标识别模型。该项目探索了新的自动缩放和多变形卷积网络架构，利用参数化运动场来自动定位每个卷积滤波器的视觉对象的正确变形。为了从视频中学习运动场，研究团队使用Siamese卷积-反卷积网络预测两个连续帧的边界，并利用输出到输出反馈回路推断边界运动。研究小组将这种方法应用于视频分割，并使用它为运动场的弱监督学习生成注释。在视频分割、多目标跟踪、不可见变形视频中的目标分类和检测等任务中，对该方法进行了评价。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。