权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

EAGER: Example-based Audio Editing

EAGER：基于示例的音频编辑

基本信息

批准号：
1451380
负责人：
Paris Smaragdis
金额：
$ 15万
依托单位：
University of Illinois at Urbana-Champaign
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2014
资助国家：
美国
起止时间：
2014-09-01 至 2018-05-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1451380&HistoricalAwards=false
关键词：
EAGER Example based Audio Editing

项目摘要

Contemporary users of technology interact with photos and video by editing them, but still use audio only passively, by capturing, storing, transmitting, and playing it back. These two different ways of interacting with contemporary media persist because current software tools make it very difficult for general users to manipulate audio. This project will develop novel technologies that will make audio editing and manipulation accessible to non-experts. These tools will allow a user to guide the software with audio editing requests by vocalizing the desired edits, providing before/after examples of the desired effects, or by presenting other recordings that exhibit the desired audio manipulations. For example, a user might issue a command to the software to equalize sounds by using a booming voice for more bass, or a nasal tone for middle frequencies; to add echoes by mimicking the desired effect by uttering "hello, hello, hello ..." with each successive "hello" in a lower volume; or to add reverb by providing examples of recordings with the desired reverb. Making it easier for general computer users to manipulate and edit audio recordings can impact many fields, such as medical bioacoustics, seismic signal analysis, underwater monitoring, audio forensics, surveillance applications, oil exploration probing, conversational data gathering, and mechanical vibration measuring. The goals of this project are to provide novel and practical audio tools that will allow non-expert practitioners from these fields to easily achieve required audio manipulations.The project will exploit modern signal processing and machine learning techniques to produce more intuitive interfaces that help people accomplish what are currently difficult audio editing tasks. This will include developing novel estimators to extract editing-intent parameters directly from audio recordings. The project will focus on three different editing operations: equalization, noise control, and echo/reverberation. A number of different approaches will be explored for each operation. For example, for equalization, one approach will have users select before and after sounds to identify their desired modification, and the system will then use spectral deconvolution estimations to directly compute the transfer function that maps the spectrum of the before sound to that of the after sound, and apply that function to the audio recording that the user is editing. For noise control, one approach will have users vocalize what types of noise to remove, and then match the user's input with the corresponding component in the recording that is being edited by using low-rank spectral decomposition. For reverb and echo, one approach will have users utter "one, two, three, ..." to illustrate the desired number of repetitions, temporal spacing, and attenuation between echoes, and then use voice detection measurements to extract the echo parameters, while correcting for vocalization errors such as random inconsistency in the echo spacing. The project will create new theories of how human guidance and automated audio-intelligent processing can work in tandem to solve fundamental and practical problems.

当代技术用户通过编辑照片和视频来与它们互动，但仍然只是被动地使用音频，通过捕捉、存储、传输和播放。这两种与当代媒体互动的不同方式之所以存在，是因为当前的软件工具让普通用户很难操纵音频。该项目将开发新技术，使音频编辑和操作对非专业人士也能进行。这些工具将允许用户通过发出所需的编辑来指导软件的音频编辑请求，提供所需效果的前/后示例，或通过呈现其他显示所需音频操作的录音。例如，用户可以向软件发出一个命令，通过使用洪亮的声音来增加低音，或者使用鼻音来增加中频来平衡声音；通过模仿期望的效果，以较低的音量说“hello, hello, hello…”或者通过提供具有所需混响的录音示例来添加混响。使普通计算机用户更容易操作和编辑音频记录可以影响许多领域，例如医学生物声学、地震信号分析、水下监测、音频取证、监视应用、石油勘探探测、会话数据收集和机械振动测量。该项目的目标是提供新颖实用的音频工具，使这些领域的非专业从业者能够轻松实现所需的音频操作。该项目将利用现代信号处理和机器学习技术来产生更直观的界面，帮助人们完成目前困难的音频编辑任务。这将包括开发新的估计器，直接从音频记录中提取编辑意图参数。该项目将侧重于三种不同的编辑操作：均衡、噪声控制和回声/混响。每个操作将探索许多不同的方法。例如，对于均衡，一种方法是让用户选择前后声音来识别他们想要的修改，然后系统将使用频谱反卷积估计来直接计算将前后声音的频谱映射到后声音的传递函数，并将该函数应用于用户正在编辑的音频记录。对于噪声控制，一种方法是让用户说出要去除的噪声类型，然后使用低秩频谱分解将用户的输入与正在编辑的录音中的相应分量相匹配。对于混响和回声，一种方法是让用户说出“一、二、三、……”来说明所需的重复次数、时间间隔和回声之间的衰减，然后使用语音检测测量来提取回声参数，同时纠正诸如回声间隔随机不一致之类的发声错误。该项目将创造新的理论，说明人类引导和自动音频智能处理如何协同工作，以解决基本和实际问题。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Paris Smaragdis其他文献

Recognizing speech from simultaneous speakers

识别同时发言者的语音

DOI：
10.21437/interspeech.2005-852
发表时间：
2005
期刊：
Speech Commun.
影响因子：
0
作者：
B. Raj;Rita Singh;Paris Smaragdis
通讯作者：
Paris Smaragdis

Complete and Separate: Conditional Separation with Missing Target Source Attribute Completion

完整和分离：缺少目标源属性完成的条件分离

DOI：
10.1109/waspaa58266.2023.10248081
发表时间：
2023
期刊：
2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
影响因子：
0
作者：
Dimitrios Bralios;Efthymios Tzinis;Paris Smaragdis
通讯作者：
Paris Smaragdis