权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

III: Small: Collaborative Research: Algorithms for Query by Example of Audio Databases

III：小：协作研究：以音频数据库为例的查询算法

基本信息

批准号：
1617107
负责人：
Zhiyao Duan
金额：
$ 29.98万
依托单位：
University of Rochester
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2016
资助国家：
美国
起止时间：
2016-09-01 至 2020-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1617107&HistoricalAwards=false
关键词：
III Small Collaborative Research Algorithms

项目摘要

Finding ways to automatically index, label, and access multimedia content (such as audio documents) is increasing in importance as multimedia repositories proliferate and grow. The community-generated SoundCloud repository is one example. It contains recordings of bands, sound effects, podcasts, etc., and contributors upload 12 hours of audio every minute. Repositories like SoundCloud typically tag audio at the file level with short text labels. Text-based search for a desired recording using these labels can be problematic. Text-based search within a track is not possible, since they are not indexed with tags in the body of the file. In this project, investigators at the University of Rochester and Northwestern University aim to develop methods and a system for audio search via query-by-example, where the example is similar, in some key way, to the desired audio in the database, but is not an exact match. This will allow search within files, bypassing the need for text-based tagging. This project will be focusing on using vocal imitations as search keys because they are natural for humans and are widely used in interaction. It will develop a novel search engine for sounds that takes vocal imitations as queries (e.g., imitation of a bird call to find recordings of the bird call). The technology developed for this novel way to search through audio/video collections will also benefit society in numerous other ways, such as crime surveillance (e.g., automated gunshot or scream detection for policy monitoring stations), biodiversity measurement (e.g., automatic ID of bird species that sound "like this" in field recordings), environmental awareness for the hearing impaired (e.g., alert me when my dog is the one barking), a production aid for a movie sound designer (e.g., finding door slam sounds in a database of thousands of sound effects), and sound-based diagnosis (e.g., "your car needs a new starter motor"). The project will benefit science technology engineering and mathematics (STEM) education as audio-based research has been shown to be a successful way to attract diverse college students into STEM disciplines.Vocal imitation conveys rich information covering many acoustic aspects: pitch, loudness, timbre, their temporal evolutions, and rhythmic patterns, etc. This lets a user query for precise sounds that are difficult to search for with text tags. For the same reason, however, vocal imitations may vary from the desired target on many dimensions. The query sound can also lies in a very constrained sound space compared to the sounds to be retrieved, due to the physical constraints of the human vocal system. Building a successful query-by-vocal-imitation system will require research into methods for representing audio and retrieving audio based on queries that are similar to target sounds only on a subset of their measurable dimensions. It will also require interfaces that facilitate providing queries and refining search results in a non-text-based context. For the former, the investigators will research on methods for learning of aspect-specific audio representations using deep neural networks. The investigators will also develop matching algorithms suitable for these representations. The investigators will design novel search interfaces that let users iteratively refine their search results. The system will learn from the interactions and adjust the weightings of different acoustic aspects to search for the wanted sound. Expected outcomes of this research are: (1) audio representations that highlight perceptually relevant features of vocal queries for matching to general audio target sounds; (2) algorithms for matching and aligning vocal queries to general audio; (3) interaction methods for iteratively refining search results using vocal imitations and sound examples; (4) a large vocal imitation and sound dataset; and (5) an open-source sound retrieval system that embodies these outcomes. More information about this project can be found at the project web site (http://www.ece.rochester.edu/projects/air/projects/audiosearch).

随着多媒体存储库的激增和增长，寻找自动索引、标记和访问多媒体内容（如音频文档）的方法越来越重要。社区生成的SoundCloud存储库就是一个例子。它包含了乐队、音效、播客等的录音，每分钟上传12小时的音频像SoundCloud这样的存储库通常在文件级别使用短文本标签标记音频。使用这些标签对期望的记录进行基于文本的搜索可能是有问题的。无法在曲目中进行基于文本的搜索，因为它们未使用文件主体中的标记进行索引。在这个项目中，罗切斯特大学和西北大学的研究人员旨在开发一种方法和一个系统，通过查询的例子，其中的例子是类似的，在一些关键的方式，在数据库中所需的音频，但不是一个完全匹配的音频搜索。这将允许在文件内进行搜索，而无需基于文本的标记。这个项目将专注于使用声音模仿作为搜索关键字，因为它们对人类来说是自然的，并且广泛用于交互。它将开发一种新颖的声音搜索引擎，以声音模仿作为查询（例如，模仿鸟鸣以找到鸟鸣的录音）。为这种通过音频/视频集合进行搜索的新方式开发的技术也将以许多其他方式造福社会，例如犯罪监视（例如，用于策略监测站的自动枪声或尖叫检测），生物多样性测量（例如，自动识别在现场记录中听起来“像这样”的鸟类），听力受损者的环境意识（例如，当我的狗在叫的时候提醒我），电影声音设计师的制作辅助（例如，在具有数千种声音效果的数据库中寻找摔门声音），以及基于声音的诊断（例如，“您的汽车需要一个新的起动电机”）。该项目将有利于科学技术工程和数学（STEM）教育，因为基于音频的研究已被证明是吸引不同大学生进入STEM学科的成功方式。声音模仿传达了涵盖许多声学方面的丰富信息：音高、响度、音色、它们的时间演变和节奏模式等。这使用户能够查询难以用文本标签搜索的精确声音。然而，出于同样的原因，声音模仿可能在许多维度上与期望的目标不同。由于人类发声系统的物理约束，与要检索的声音相比，查询声音也可以位于非常受约束的声音空间中。建立一个成功的语音模仿查询系统将需要研究表示音频和检索音频的方法，这些方法基于仅在其可测量维度的子集上与目标声音相似的查询。它还将需要便于在非文本背景下提供查询和改进搜索结果的界面。对于前者，研究人员将研究使用深度神经网络学习特定方面音频表示的方法。研究人员还将开发适合这些表征的匹配算法。研究人员将设计新颖的搜索界面，让用户反复完善他们的搜索结果。系统将从交互中学习，并调整不同声学方面的权重，以搜索想要的声音。本研究的预期成果是：（1）音频表示，突出感知相关特征的声乐查询匹配到一般的音频目标声音;（2）算法匹配和对齐声乐查询到一般的音频;（3）交互方法迭代细化搜索结果使用声乐模仿和声音的例子;（4）一个大的声乐模仿和声音数据集;以及（5）体现这些成果的开源声音检索系统。关于该项目的更多信息可在项目网站上查阅（http：//www.ece.rochester. edu/projects/air/projects/audiosearch）。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Zhiyao Duan其他文献

Amorphous Cobalt Oxide Nanoparticles as Active Water-Oxidation Catalyst

非晶态氧化钴纳米粒子作为活性水氧化催化剂

DOI：
发表时间：
2017
期刊：
ChemCatChem
影响因子：
4.5
作者：
Zheng Chen;Zhiyao Duan;Zhiliang Wang;Xiaoyan Liu;Lin Gu;Fuxiang Zhang;Michel Dupuis;Can Li
通讯作者：
Can Li

EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis

EDMSound：基于频谱图的扩散模型，用于高效、高质量的音频合成

DOI：
发表时间：
2023
期刊：
arXiv.org
影响因子：
0
作者：
Ge Zhu;Yutong Wen;M. Carbonneau;Zhiyao Duan
通讯作者：
Zhiyao Duan

Amorphous Cobalt Oxide Nanoparticles as Active WaterOxidation Cata

无定形氧化钴纳米颗粒作为活性水氧化催化剂

DOI：
发表时间：
2017
期刊：
ChemCatChem
影响因子：
4.5
作者：
Zheng Chen;Zhiyao Duan;Zhiliang Wang;Xiaoyan Liu;Lin Gu;Fuxiang Zhang;Michel Dupuis;Can Li
通讯作者：
Can Li

SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan

SVDD 挑战 2024：歌声 Deepfake 检测挑战评估计划

DOI：
10.48550/arxiv.2405.05244
发表时间：
2024
期刊：
ArXiv
影响因子：
0
作者：
You Zhang;Yongyi Zang;Jiatong Shi;Ryuichi Yamamoto;Jionghao Han;Yuxun Tang;T. Toda;Zhiyao Duan
通讯作者：
Zhiyao Duan