权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Exploiting Deep Generative Models for Visual Recognition

职业：利用深度生成模型进行视觉识别

基本信息

批准号：
2239076
负责人：
Jun-Yan Zhu
金额：
$ 58.19万
依托单位：
Carnegie-Mellon University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-04-01 至 2028-03-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2239076&HistoricalAwards=false
关键词：
CAREER Exploiting Deep Generative Models

项目摘要

Modern visual recognition systems have achieved impressive results on standard benchmarks and work reliably for common objects and scenes, given massive data and annotations. Unfortunately, current systems struggle to detect rare or unseen objects and fail to adapt to new domains. Researchers, engineers and/or domain experts have to capture and annotate huge amounts of real data, which are costly for common objects and impractical for rare objects and corner cases (i.e., cases that occur when multiple unique conditions simultaneously occur). To address the above challenges and automatically create and label data that fully depict the corner cases, this project leverages the rich compositional structure and powerful synthesis capacity of large-scale generative models. By using these models that can quickly synthesize diverse objects and scenes with an unknown visual elements (e.g., new poses, weather, lighting, etc.). This project will develop recognition algorithms that can recognize rare/unseen objects to adapt to continuously changing environments. This project has a potential to be transformative for various applications, such as autonomous driving, assistive robots, healthcare, e-commerce, and mixed reality. Furthermore, this research will translate to code, models, courses, and tutorials, that are widely accessible to diverse stakeholders and education and research programs that engage with the broader community. Directly using generative models is challenging, as it is highly unlikely that a randomly sampled image will cover a corner case that can improve recognition systems. To synthesize data that more closely resemble the long-tail distribution and new domains, this project will focus on three research thrusts. First, the project addresses learning visual recognition via generative models by exploring different methods of automatically generating data and annotations. Second, the project will analyze visual recognition systems through generative models by synthesizing diverse, continuously evolving test data to interrogate the system and understand the biases. Finally, the project will automatically select and adapt generative models to new domains and tasks. These three thrusts are tightly connected, as once the algorithms identify hard examples that fail our current system, these examples can be used to close the loop between training and analysis. Finally, investigators will evaluate the developed method by comparing methods with or without using generative models.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

现代视觉识别系统在标准基准测试中取得了令人印象深刻的结果，并在大量数据和注释的情况下可靠地工作于常见对象和场景。不幸的是，目前的系统很难检测到罕见或看不见的物体，并且无法适应新的领域。研究人员、工程师和/或领域专家必须捕获和注释大量的真实的数据，这对于普通对象是昂贵的，而对于稀有对象和角落情况（即，当多个唯一条件同时发生时发生的情况）。为了解决上述挑战并自动创建和标记充分描述角落案例的数据，该项目利用了大规模生成模型的丰富组成结构和强大的合成能力。通过使用这些模型，可以快速合成具有未知视觉元素的各种对象和场景（例如，新姿势、天气、照明等）。该项目将开发识别算法，可以识别罕见/看不见的物体，以适应不断变化的环境。该项目有可能对各种应用产生变革性影响，例如自动驾驶、辅助机器人、医疗保健、电子商务和混合现实。此外，这项研究将转化为代码，模型，课程和教程，可广泛访问不同的利益相关者和教育和研究计划，与更广泛的社区参与。直接使用生成模型具有挑战性，因为随机采样的图像不太可能覆盖可以改进识别系统的角落情况。为了综合更接近长尾分布和新领域的数据，本项目将侧重于三个研究重点。首先，该项目通过探索自动生成数据和注释的不同方法，通过生成模型来学习视觉识别。其次，该项目将通过生成模型分析视觉识别系统，通过合成多样化的，不断发展的测试数据来询问系统并了解偏差。最后，该项目将自动选择生成模型并使其适应新的领域和任务。这三个方面是紧密相连的，因为一旦算法识别出我们当前系统失败的硬示例，这些示例就可以用来关闭训练和分析之间的循环。最后，研究人员将通过比较使用或不使用生成模型的方法来评估所开发的方法。该奖项反映了NSF的法定使命，并被认为是值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估的支持。

项目成果

期刊论文数量（3）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Ablating Concepts in Text-to-Image Diffusion Models

DOI：
10.1109/iccv51070.2023.02074
发表时间：
2023-03
期刊：
2023 IEEE/CVF International Conference on Computer Vision (ICCV)
影响因子：
0
作者：
Nupur Kumari;Bin Zhang;Sheng-Yu Wang;Eli Shechtman;Richard Zhang;Jun-Yan Zhu
通讯作者：
Nupur Kumari;Bin Zhang;Sheng-Yu Wang;Eli Shechtman;Richard Zhang;Jun-Yan Zhu

Expressive Text-to-Image Generation with Rich Text