权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CHS: Small: Robust Interactive Audio Source Separation

CHS：小型：强大的交互式音频源分离

基本信息

批准号：
1420971
负责人：
Bryan Pardo
金额：
$ 49.87万
依托单位：
Northwestern University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2014
资助国家：
美国
起止时间：
2014-10-01 至 2018-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1420971&HistoricalAwards=false
关键词：
CHS Small Robust Interactive Audio

项目摘要

Algorithms to separate audio sources have many potential uses such as to extract important audio data from historic recordings or to help people with hearing impairments select what to amplify and what to suppress in their hearing aids. Computer processing of audio content can potentially be used to isolate the sound sources of interest and to improve the audio clarity any time that the content exhibits interference from multiple sound sources, such as to extract a single voice of interest from a room full of voices. However, current sound source identification and separation methods are only reliable when there is a single predominant sound. This project will develop the science and technology that is needed to more easily isolate a single sound source from audio content with multiple competing sources, and that is needed to build interactive computer systems that will guide users though an interactive source separation process, to permit the separation and recombining of sound sources in a manner that is beyond the reach of existing audio software. The outcomes of the project will improve the possibility of speech recognition in environments with multiple talkers, will be useful for many scientific inquiries such as in biodiversity monitoring through the automated analysis of field recordings, and will be broadly useful any time that manual tagging of audio data is not practical.While many computational auditory scene analysis algorithms have been proposed to separate audio scenes into individual sources, current methods are brittle and difficult to use and as a result have not been broadly adopted by potential users. The methods are brittle in that each algorithm relies on a single cue to separate sources and if the cue is not reliable then the method fails. The methods are difficult to use because the algorithms cannot predict which audio scenes any specific algorithm is likely to work on, and so the user does not know which method to apply in any given case. They are also difficult to use because their control parameters are hard to understand for users who lack expertise in signal processing. This project will research how to integrate multiple source separation algorithms into a single framework, and how to improve the ease of use by exploring interfaces that permit users to interactively define what they wish to isolate in audio scenes, and that permit systems to provide users with guidance on selecting a tool and setting the necessary parameters. The project will produce an open-source audio source separation tool that embodies these scientific research outcomes.

分离音频源的算法有许多潜在的用途，例如从历史记录中提取重要的音频数据，或帮助听力受损的人选择在助听器中放大和抑制哪些内容。音频内容的计算机处理可以潜在地用于隔离感兴趣的声源并在内容显示来自多个声源的干扰时提高音频清晰度，例如从充满声音的房间中提取单个感兴趣的声音。然而，目前的声源识别和分离方法只有在存在单一主导音时才可靠。该项目将发展科学和技术，以便更容易地从具有多个竞争源的音频内容中分离出单个声源，并且需要建立交互式计算机系统，该系统将指导用户通过交互式声源分离过程，允许以现有音频软件无法达到的方式分离和重新组合声源。该项目的成果将提高在有多人说话的环境中语音识别的可能性，将对许多科学调查有用，例如通过对现场录音的自动分析进行生物多样性监测，并且在手动标记音频数据不实际的任何时候都将广泛有用。虽然已经提出了许多计算听觉场景分析算法来将音频场景分离为单个源，但目前的方法很脆弱且难以使用，因此没有被潜在用户广泛采用。这些方法很脆弱，因为每个算法都依赖于单个线索来分离源，如果线索不可靠，那么方法就会失败。这些方法很难使用，因为算法无法预测任何特定算法可能处理的音频场景，因此用户不知道在任何给定情况下应用哪种方法。它们也很难使用，因为它们的控制参数对于缺乏信号处理专业知识的用户来说很难理解。该项目将研究如何将多个源分离算法集成到单个框架中，以及如何通过探索允许用户交互定义他们希望在音频场景中隔离的内容的接口来提高易用性，并允许系统为用户提供选择工具和设置必要参数的指导。本项目将制作一个体现这些科研成果的开源音频源分离工具。