复杂声学环境下声学事件检测与音频场景识别方法研究-猫眼课题宝

权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

复杂声学环境下声学事件检测与音频场景识别方法研究

结题报告

批准号：

U1736210

项目类别：

联合基金项目

资助金额：

256.0 万元

负责人：

韩纪庆

依托单位：

哈尔滨工业大学

学科分类：

F0111.信号理论与信号处理

结题年份：

2021

批准年份：

2017

项目状态：

已结题

项目参与者：

郑铁然、闾海荣、金圣开、郑贵滨、陶焜、王伟、赵明

关键词：

音频场景识别声学事件检测复杂声学环境

国基评审专家1V1指导中标率高出同行96.8%

中文摘要

机器对环境声音的认知能力是类脑智能研究的重要方向之一。作为机器环境声音认知的一个重要方面，声学事件检测与音频场景识别受到了越来越多的重视。然而，现实中复杂的声学环境给声学事件检测与音频场景识别带来了新的挑战。与此同时，近年来信号处理与机器学习领域中理论与技术的长足发展，也为复杂声学环境下声学事件检测与音频场景识别的研究带来了新的机遇。本项目正是在这样的背景下提出的。项目拟从音频信号的降噪、特征选择与降维、基于机器学习的声学事件检测与音频场景识别方法等方面展开基础性研究。通过项目的研究提出若干具有自主知识产权的理论与技术，为提高机器对环境声音的认知能力提供理论基础和实用方法，促进类脑听觉认知学科的进步。

英文摘要

The cognitive ability of the computer for understanding the environmental sounds is one of the most important research directions in the brain-inspired intelligence. As one of the main aspects of the computer cognition of environmental sounds, the acoustic events detection and audio scenes recognition have been attracted more and more attention. However, there are new challenges for the acoustic events detection and audio scenes recognition in complex acoustic environments. Meanwhile there is also a new opportunity for the acoustic events detection and audio scenes recognition as the fast developments in the theories and technologies of the signal processing and machine learning. Based on the above background, this project is proposed and focuses on the fundamental researches in the denosing of audio signal, the feature selection and dimensionality reduction, and the new machine learning based methods in the acoustic events detection and audio scenes recognition. The main purpose of the project is to propose some theories and technologies with independent intellectual property rights and provide the theoretical principles and useful methods for improving the computer cognition of environmental sounds, and therefore make the progress of the field of the brain-inspired auditory perception.

本项目重点开展复杂声学环境下声学事件检测与音频场景识别的研究。在项目的执行过程中，按照项目计划书要求开展工作，已完成了所有的研究计划内容，并对部分内容进行了拓展性研究。在如下几方面取得了重要研究进展：① 时域与变换域先验知识兼顾的降噪方法； ② 基于半监督学习的音频信号特征选择与降维；③ 基于联合语义挖掘的声学事件与音频场景特征表示方法；④ 基于前景和背景声音特征融合的音频场景一致性特征表示方法；⑤ 基于多层多核支持向量机的声学事件检测及音频场景识别；⑥ 复杂声学环境下声学事件检测与音频场景识别在特定行业的验证。.项目组共在刊物和会议上发表学术论文37篇，其中15篇进入SCI检索源，35篇进入EI检索源。论文中有3篇发表在本领域顶级刊物IEEE/ACM Trans. on Audio, Speech, and Language Processing上，17篇发表在本领域顶级国际会议ICASSP、Interspeech和NeurIPS上，另有2篇论文被ICASSP2022录用；申请国家发明专利13项，已授权7项；获软件著作权登记2项。共培养研究生46名，其中博士研究生16名，硕士研究生30名。由2019年清华出版社出书一部。.尤为重要的是，所研发的相关技术已开始在通用技术研究院下属的黑龙江省分支机构进行了成果的应用转化，促进了其业务的开展。

期刊论文列表

专著列表

科研奖励列表

会议论文列表

专利列表

Pyramidal Temporal Pooling With Discriminative Mapping for Audio Classification