权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Echo - A New Set of High-level Audio Features for Computational Sound Design Systems

Echo - 用于计算声音设计系统的一组新的高级音频功能

基本信息

批准号：
RGPIN-2021-02893
负责人：
Thorogood, Miles
金额：
$ 1.75万
依托单位：
University of British Columbia
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=761816
关键词：
Echo New Set level Audio

项目摘要

I propose to advance the knowledge of sound analysis models and algorithms for building new computational tools that can be used to increase the capacity of audio retrieval systems and sound engineering. First, new high-level audio descriptors and an annotated dataset will be investigated. Second, gold-standard predictive and multiple-output neural network models for audio signal recognition will be created. First, from our past work concerned with modeling mood-based audio descriptors, we discovered that many attributes of the human listening experience are missing from the state-of-the-art challenges facing Music Information Retrieval. We design a listening experiment with user and sound design experts to code varying audio stimulus to derive a set of sound describing words. We analyze study results to reveal the correlations between expert and user experience and obtain high-level audio descriptors. From the set of audio descriptors and associated meanings, we will develop a novel lexicon that will address this gap in the literature. Building on our past success of creating large datasets of annotated audio files through crowd-sourcing, we will design an experiment using an online ranking-based questionnaire where annotators make pairwise comparisons between two audio clips based on individual terms from the lexicon. We expect the outcome from this research to result in labelling each audio file in the corpus with a rating value for every concept in the lexicon. This will be the first such dataset annotated with a comprehensive set of sound design descriptors for Music Information Retrieval. Second, To addresses the problem of how a machine can predict the important sonic characteristics for the user experience in video games, VR, and film, we first run a series of machine learning experiments to create gold-standard models for predicting each of the features represented in the lexicon. We will run experiments for training, tuning, and evaluating different supervised machine learning models and assign select audio features and optimized model to each corresponding concept in the lexicon. Next, we investigate a deep neural network model for predicting all (200+) features simultaneously. For this model, we will experiment with different neural network topologies utilizing built-from-scratch techniques and TensorFlow to develop a deep learning model for multi-output regression. The trained model will be evaluated using standard regression metrics and against the set of gold standard models. We plan to build on the Essentia project (an open-source library and tools for audio and music analysis, description and synthesis) to release the gold-standard and multi-output regression DNN models as an open-source library for audio and music analysis. This research directly involves 8 highly qualified personnel working in interdisciplinary teams to become leaders in advanced music information retrieval and creative A.I..

我建议推进声音分析模型和算法的知识，以建立新的计算工具，可用于提高音频检索系统和声音工程的能力。首先，将研究新的高级音频描述符和注释数据集。其次，将创建用于音频信号识别的黄金标准预测和多输出神经网络模型。首先，从我们过去对基于情绪的音频描述符建模的工作中，我们发现人类听觉体验的许多属性在音乐信息检索面临的最新挑战中缺失。我们设计了一个听力实验，用户和声音设计专家编码不同的音频刺激，以获得一组声音描述的话。我们分析研究结果，揭示专家和用户体验之间的相关性，并获得高层次的音频描述符。从一组音频描述符和相关的含义，我们将开发一个新的词汇，将解决这一差距的文献。基于我们过去通过众包创建注释音频文件的大型数据集的成功经验，我们将设计一个实验，使用基于在线排名的问卷，注释者根据词典中的各个术语在两个音频片段之间进行成对比较。我们期望这项研究的结果能够为语料库中的每个音频文件标记词典中每个概念的评级值。这将是第一个这样的数据集注释了一套全面的声音设计描述符的音乐信息检索。其次，为了解决机器如何预测视频游戏、VR和电影中用户体验的重要声音特征的问题，我们首先运行了一系列机器学习实验，以创建用于预测词典中表示的每个特征的黄金标准模型。我们将进行训练、调整和评估不同监督机器学习模型的实验，并将选择的音频特征和优化模型分配给词典中的每个相应概念。接下来，我们研究一个深度神经网络模型，用于同时预测所有（200+）特征。对于这个模型，我们将使用不同的神经网络拓扑进行实验，利用从头开始构建的技术和TensorFlow来开发一个用于多输出回归的深度学习模型。将使用标准回归指标并对照金标准模型集对训练模型进行评估。我们计划在Ada项目（一个用于音频和音乐分析、描述和合成的开源库和工具）的基础上，发布黄金标准和多输出回归DNN模型，作为音频和音乐分析的开源库。这项研究直接涉及8名高素质的跨学科团队工作人员，成为先进音乐信息检索和创造性人工智能的领导者。