权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Deep Learning Systems for Musical Audio Generation

用于音乐音频生成的深度学习系统

基本信息

批准号：
RGPIN-2020-05968
负责人：
Oore, Sageev
金额：
$ 2.11万
依托单位：
Dalhousie University
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=750278
关键词：
Deep Learning Systems Musical Audio

项目摘要

Imagine you are creating a soundtrack for a video, and you have a machine learning (ML)-driven music creation tool/assistant. Your ML assistant can generate sounds and musical clips, but there is a problem:it can neither follow instructions, nor figure out what needs to be done. For an assistant to be helpful to humans, there must be a way to direct what it does and how it does it. My research programme is concerned with ML in the sphere of audio and music generation, and this proposal is concerned with imbuing generative models for music and audio with powerful controls (e.g. to allow effective ML-based tools). Specifically, I plan to explore this in two main musical contexts: (1) generating sequences of distinct notes (e.g. keypresses on a piano), and (2) generating raw audio files, i.e. soundwaves, one "measurement" at a time (where the lowest-quality sounds have at least 16,000 such measurements per second). Finely controlling generation in these domains is hard for reasons including: (1) Our vocabulary to describe sounds is limited and ill-defined. We can hear that this violin sounds "richer" than that one, or that this speech has a "more articulated rhythm" than that one, but we may not know how to quantify these qualities. (2) Most data does not come with these labels. The ML challenges implied by these difficulties are fundamental ones: learning underlying structure in large, minimally labelled datasets of extremely long sequences. So how do we approach the task of control? We first notice what people do: a teacher tells a student, "play it this way," providing a related example; that single example becomes immediately helpful, since the student already has a wordless mental map of sounds, and the example becomes an analogy that points to a spot on that map. I want to find techniques to allow control over generative models in such ways. This involves: (I) Learning good maps (~disentangled latent representations) of sound. For example, moving along one direction might mean more rhythmic in some way. (II) Learning these from mainly unlabelled data, making effective use of rare labelled examples (~semi-supervised learning). This is important because: 1) Some ML problems inherent to this problem are fundamental, so their solutions will be fundamental as well. 2) Controlling generative models can allow them to be helpful.. ..to artists, as they will provide effective creativity support to the creative economy. ..to amateur musicians, because such tools can easily have great educational value. ..to health. Rhythmic music can help motor rehabilitation; imagine a tireless and adaptive music generator designed specifically for rehabilitation. Psychiatric diagnoses are sometimes based on non-verbal speech qualities; imagine controlling speech generation by examples: "use a voice like Person A, with an accent like Person B, and with the prosody of Person C." and in doing so, helping training and removing potential bias effects.

想象一下，您正在为视频创建配乐，并且您有一个机器学习 (ML) 驱动的音乐创作工具/助手。你的机器学习助手可以生成声音和音乐片段，但有一个问题：它既不能遵循指令，也不能弄清楚需要做什么。为了让助手对人类有所帮助，必须有一种方法来指导它做什么以及如何做。我的研究项目涉及音频和音乐生成领域的机器学习，该提案涉及为音乐和音频的生成模型注入强大的控制（例如，允许有效的基于机器学习的工具）。具体来说，我计划在两个主要的音乐环境中对此进行探索：(1) 生成不同音符的序列（例如钢琴上的按键），以及 (2) 生成原始音频文件，即声波，一次一个“测量”（最低质量的声音每秒至少有 16,000 个这样的测量）。精细控制这些领域的生成很困难，原因包括：（1）我们描述声音的词汇有限且定义不明确。我们可以听到这把小提琴听起来比那把“更丰富”，或者这段演讲比那把“更有清晰的节奏”，但我们可能不知道如何量化这些品质。 (2) 大多数数据不带有这些标签。这些困难所隐含的机器学习挑战是根本性的：在极长序列的大型、标记最少的数据集中学习底层结构。那么我们如何完成控制任务呢？我们首先注意到人们的做法：老师告诉学生“这样玩”，并提供一个相关的例子；这个单一的例子立即变得有帮助，因为学生已经有了一张无言的声音心理地图，并且这个例子变成了指向该地图上的一个点的类比。我想找到能够以这种方式控制生成模型的技术。这涉及：（I）学习良好的声音地图（~解开的潜在表征）。例如，沿着一个方向移动可能意味着在某种程度上更有节奏。（II）主要从未标记的数据中学习这些数据，有效利用罕见的标记示例（~半监督学习）。这很重要，因为：1）这个问题固有的一些机器学习问题是根本性的，因此它们的解决方案也将是根本性的。 2) 控制生成模型可以让它们对艺术家有所帮助，因为它们将为创意经济提供有效的创意支持。 ..对于业余音乐家来说，因为这样的工具很容易具有很大的教育价值。 ..为了健康。有节奏的音乐可以帮助运动康复；想象一下专为康复设计的不知疲倦且自适应的音乐发生器。精神病学诊断有时基于非语言的言语质量；想象一下通过示例控制语音生成：“使用像 A 一样的声音，像 B 一样的口音，以及 C 一样的韵律。”并在此过程中帮助培训和消除潜在的偏见影响。