权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

III: Medium: Spatial Sound Scene Description

III：媒介：空间声音场景描述

基本信息

批准号：
1955357
负责人：
Juan Bello
金额：
$ 99.99万
依托单位：
New York University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2020
资助国家：
美国
起止时间：
2020-07-01 至 2024-06-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1955357&HistoricalAwards=false
关键词：
III Medium Spatial Sound Scene

项目摘要

Sound is rich with information about the surrounding environment. If you stand on a city sidewalk with your eyes closed and listen, you will hear the sounds of events happening around you: birds chirping, squirrels scurrying, people talking, doors opening, an ambulance speeding, a truck idling. In addition, you will also likely be able to perceive the location of each sound source, where it’s going, and how fast it’s moving. This project will build innovative technologies to allow computers to extract this rich information out of sound. By not only identifying which sound sources are present but also estimating the spatial location and movement of each sound source, sound sensing technology will be able to better describe our environments with microphone-enabled everyday devices, e.g. smartphones, headphones, smart speakers, hearing-aids, home camera, and mixed-reality headsets. For hearing impaired individuals, the developed technologies have the potential to alert them to dangerous situations in urban or domestic environments. For city agencies, acoustic sensors will be able to more accurately quantify traffic, construction, and other activities in urban environments. For ecologists, this technology can help them more accurately monitor and study wildlife. In addition, this information complements what computer vision can sense, as sound can include information about events that are not easily visible, such as sources that are small (e.g., insects), far away (e.g., a distant jackhammer), or simply hidden behind another object (e.g., an incoming ambulance around a building's corner). This project also includes outreach activities involving over 100 public school students and teachers, as well as the training and mentoring of postdoctoral, graduate and undergraduate students. This project will develop computational models for spatial sound scene description: that is, estimating the class, spatial location, direction and speed of movement of living beings and objects in real environments by the sounds they make. The investigators aim for their solutions to be robust across a wide range of sound scenes and sensing conditions: noisy, sparse, natural, urban, indoors, outdoors, with varying compositions of sources, with unknown sources, with moving sources, with moving sensors, etc. While current approaches show promise, they are still far from robust in real-world conditions and thus unable to support any of the above scenarios. These shortcomings stem from important data issues such as a lack of spatially annotated real-world audio data, and an over-reliance on poor quality, unrealistic synthesized data; as well as methodological issues such as excessive dependence on supervised learning and a failure to capture the structure of the solution space. This project plans an approach mixing innovative data collection strategies with cutting-edge machine learning solutions. First, it advances a novel framework for the probabilistic synthesis of soundscape datasets using physical and generative models. The goal is to substantially increase the amount, realism and diversity of strongly-labeled spatial audio data. Second, it collects and annotates new datasets of real sound scenes via a combination of high-quality field recordings, crowdsourcing, novel VR/AR multimodal annotation strategies and large-scale annotation by citizen scientists. Third, it puts forward novel deep self-supervised representation learning strategies trained on vast quantities of unlabeled audio data. Fourth, these representation modules are paired with hierarchical predictive models, where the top/bottom levels of the hierarchy correspond to coarser/finer levels of scene description. Finally, the project includes collaborations with three industrial partners to explore applications enabled by the proposed solutions. The project will result in novel methods and open source software libraries for spatial sound scene generation, annotation, representation learning, and sound event detection/localization/tracking; and new open datasets of spatial audio recordings, spatial sound scene annotations, synthesized isolated sounds, and synthesized spatial soundscapes.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

声音包含着丰富的关于周围环境的信息。如果你站在城市的人行道上，闭上眼睛倾听，你会听到周围发生的事情的声音：鸟儿鸣叫、松鼠奔跑、人们说话、门打开、救护车超速、卡车空转。此外，您还可能能够感知每个声源的位置，它的去向以及移动速度。该项目将建立创新技术，使计算机能够从声音中提取丰富的信息。通过不仅识别哪些声源存在，而且还估计每个声源的空间位置和移动，声音感测技术将能够更好地描述我们的环境与麦克风启用的日常设备，例如智能手机，耳机，智能扬声器，助听器，家用摄像头和混合现实耳机。对于听力受损的个人，开发的技术有可能提醒他们注意城市或家庭环境中的危险情况。对于城市机构来说，声学传感器将能够更准确地量化城市环境中的交通、建筑和其他活动。对于生态学家来说，这项技术可以帮助他们更准确地监测和研究野生动物。此外，该信息补充了计算机视觉可以感测的内容，因为声音可以包括关于不容易可见的事件的信息，例如小的源（例如，昆虫），远离（例如，远处的手提钻），或者简单地隐藏在另一物体后面（例如，建筑物拐角处有一辆救护车驶来）。该项目还包括涉及100多名公立学校学生和教师的外联活动，以及对博士后、研究生和本科生的培训和辅导。本项目将开发空间声音场景描述的计算模型：即通过生物和物体发出的声音估计它们在真实的环境中的类别、空间位置、方向和运动速度。研究人员的目标是他们的解决方案在广泛的声音场景和传感条件下是强大的：嘈杂的，稀疏的，自然的，城市的，室内的，室外的，具有不同组成的来源，未知的来源，与移动源，与移动传感器等，虽然目前的方法显示的承诺，他们仍然远远不够强大，在现实世界的条件下，因此无法支持任何上述情况。这些缺点源于重要的数据问题，例如缺乏空间注释的真实世界音频数据，以及过度依赖质量差，不切实际的合成数据;以及方法问题，例如过度依赖监督学习和未能捕获解决方案空间的结构。该项目计划将创新的数据收集策略与尖端的机器学习解决方案相结合。首先，它提出了一个新的框架，使用物理和生成模型的概率合成的声景数据集。我们的目标是大幅增加强标记空间音频数据的数量，真实性和多样性。其次，通过高质量的现场录音、众包、新颖的VR/AR多模态注释策略和公民科学家的大规模注释相结合，收集和注释真实的声音场景的新数据集。第三，它提出了一种新的深度自监督表示学习策略，在大量未标记的音频数据上进行训练。第四，这些表示模块与分层预测模型配对，其中分层结构的顶部/底部级别对应于场景描述的较粗/较细级别。最后，该项目包括与三个行业合作伙伴的合作，以探索由所提出的解决方案支持的应用程序。该项目将产生用于空间声音场景生成、注释、表示学习和声音事件检测/定位/跟踪的新方法和开放源码软件库;以及空间音频记录、空间声音场景注释、合成孤立声音该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准。

项目成果

期刊论文数量（12）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Wav2CLIP: Learning Robust Audio Representations from Clip

DOI：
10.1109/icassp43922.2022.9747669
发表时间：
2021-10
期刊：
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
影响因子：
0
作者：
Ho-Hsiang Wu;Prem Seetharaman;Kundan Kumar;J. Bello
通讯作者：
Ho-Hsiang Wu;Prem Seetharaman;Kundan Kumar;J. Bello

Few-Shot Musical Source Separation

少镜头音乐源分离

DOI：
10.1109/icassp43922.2022.9747536
发表时间：
2022
期刊：
Speech and Signal Processing (ICASSP
影响因子：
0
作者：
Wang, Yu;Stoller, Daniel;Bittner, Rachel M.;Pablo Bello, Juan
通讯作者：
Pablo Bello, Juan

Sound Event Detection in Urban Audio with Single and Multi-Rate Pcen

使用单速率和多速率 Pcen 进行城市音频中的声音事件检测

DOI：
10.1109/icassp39728.2021.9414697
发表时间：
2021
期刊：
2021
影响因子：
0
作者：
Ick, Christopher;McFee, Brian
通讯作者：
McFee, Brian

Micarraylib: Software for the Reproducible Aggregation, Standardization, and Signal Processing of Microphone Array Datasets. Detection and Classification of Acoustic Scenes and Events

Micarraylib：用于麦克风阵列数据集的可重复聚合、标准化和信号处理的软件。

DOI：
发表时间：
2021
期刊：
2021
影响因子：
0
作者：
Roman, I. R.;Bello, J.P.
通讯作者：
Bello, J.P.

Analyzing the Effect of Equal-Angle Spatial Discretization on Sound Event Localization and Detection

分析等角空间离散对声音事件定位和检测的影响

DOI：
发表时间：
2022
期刊：
Proceedings of the 7th Detection and Classification of Acoustic Scenes and Events 2022 Workshop (DCASE2022
影响因子：
0
作者：
Kushwaha, S.S.
通讯作者：
Kushwaha, S.S.

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Juan Bello其他文献

EVALUATING POST-PROCEDURAL EFFECTS OF THE MEDTRONIC MICRA™ PACEMAKER ON CARDIAC FUNCTION

DOI：
10.1016/s0735-1097(24)02193-4
发表时间：
2024-04-02
期刊：
Conference abstract
影响因子：
作者：
Thomas Lee;Afif Hossain;Vinesh Jonnala;Navid Radfar;Yong Lee;Felix Afriyie;Juan Bello;Shriya Patel;Emad F. Aziz
通讯作者：
Emad F. Aziz

THE MIGHTY MITRACLIP: A CASE OF CARDIOGENIC SHOCK SECONDARY TO SEVERE MITRAL REGURGITATION FROM FLAIL LEAFLET SUCCESSFULLY MANAGED BY MITRACLIP

DOI：
10.1016/s0735-1097(24)05843-1
发表时间：
2024-04-02
期刊：
Conference abstract
影响因子：
作者：
Juan Bello;Aysha Hussain;Paul Y. Lee;Kandarp Suthar;Perry Wengrofsky;Chunguang Chen
通讯作者：
Chunguang Chen

Safety of routine protamine in the reversal of heparin in percutaneous coronary intervention: A systematic review and meta-analysis

常规鱼精蛋白在经皮冠状动脉介入治疗中逆转肝素的安全性：系统评价和荟萃分析

DOI：
10.1016/j.ijcard.2023.131168
发表时间：
2023-10-01
期刊：
International Journal of Cardiology
影响因子：
3.200
作者：
Paul Y. Lee;Juan Bello;Catherine Ye;Shruti Varadarajan;Afif Hossain;Saahil Jumkhawala;Abhishek Sharma;Joseph Allencherril
通讯作者：
Joseph Allencherril

Trastornos del control de los impulsos y punding en la enfermedad de Parkinson: la necesidad de una entrevista estructurada ☆

DOI：
发表时间：
2011
期刊：
影响因子：
0
作者：
A. Ávila;X. Cardona;Juan Bello;P. Maho;F. Sastre;M. Martín
通讯作者：
M. Martín

PE-EK A BOO: WHEN PE IS NOT REALLY PE - AORTIC DISSECTION WITH HEMATOMA MASQUERADING AS PULMONARY EMBOLISM

DOI：
10.1016/s0735-1097(24)06116-3
发表时间：
2024-04-02
期刊：
Conference abstract
影响因子：
作者：
Juan Bello;Yong Lee;Navid Radfar;Afif Hossain;Kirsys Guerrero;Jeffrey S. Lander
通讯作者：
Jeffrey S. Lander