权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Robust Speaker Verification in Real Application Scenarios

真实应用场景中稳健的扬声器验证

基本信息

批准号：
RGPIN-2019-05381
负责人：
Alam, MdJahangir
金额：
$ 2.04万
依托单位：
Centre de recherche informatique de Montréal
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2021
资助国家：
加拿大
起止时间：
2021-01-01 至 2022-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=738237
关键词：
Robust Speaker Verification Real Application

项目摘要

Spoken language is the most natural way we human communicate with each other. There is rich information conveyed in speech signal including language information, speaker information, environmental information and so on. Speaker verification refers to the problem of verifying the identity of a person from his/her voice using the characteristic vocal information. The necessity to legitimate an individuals' identity arises in several situations, including access control and authorization of financial transactions. Recent developments in speech-based technologies have led to speech being touted as becoming the primary means of communication between humans and technology in the future. As the use of speech becomes more ubiquitous, there is a need for improvement and innovation in voice-based verification technologies, specifically speaker verification, so that these methods work reliably in real world scenarios. To incorporate voice biometrics into real-world applications, it is important to ensure that verification performance can still be maintained even if the speakers are speaking in an adverse environment that the system has not confronted during training. Adverseness can be caused by background noise, reverberation, channel mismatch, language mismatch, and accent. Apart from domain robustness, a major concern with deploying speaker verification in real-world applications is the system's robustness against fraudulent attacks. This is due to the vulnerability of speaker verification systems to spoofing attacks. This proposal focuses on building speaker verification systems which are domain-invariant and are robust to spoofing attacks. To tackle the domain mismatch problem, our goal is to learn domain-invariant speaker embeddings using domain adversarial training for robust speaker verification. We propose the use of deep learning architectures trained to both classify speakers and the domain. The key insight to this approach is that while network gets better at classifying speakers but gets worse at domain classification. As a result, the network leads to domain-invariant speaker representations. We also propose employing some novel unsupervised domain adaptation approaches to bridge the source and target domains. For improving performance further, we also consider combining these approaches with data augmentation and unsupervised PLDA adaptation methods. Finally, to make ASV technology robust against fraudulent attacks, we propose a method for blind automatic detection of spoofing attacks which does not require any prior knowledge about the type of spoofing attacks. Here, convolution neural network based deep countermeasures are proposed for anti-spoofing. Novelty of the proposed research includes to take the benefit of deep learning for domain-invariant representation learning, domain adaptation and deep spoofing countermeasures. The expected results will have a certain impact on speaker recognition research and commercial communities.

口语是我们人类相互交流的最自然的方式。语音信号中蕴含着丰富的信息，包括语言信息、说话人信息、环境信息等。说话人验证是指利用说话人的声音特征信息，从说话人的声音中验证其身份的问题。在一些情况下，需要使个人身份合法化，包括访问控制和金融交易授权。基于语音的技术的最新发展使语音被吹捧为未来人类与技术之间沟通的主要手段。随着语音的使用变得越来越普遍，基于语音的验证技术需要改进和创新，特别是说话人验证，以便这些方法在现实世界场景中可靠地工作。为了将语音生物识别技术整合到现实世界的应用中，重要的是要确保即使说话者在系统在训练期间没有遇到的不利环境中说话，也能保持验证性能。背景噪音、混响、信道不匹配、语言不匹配和重音都可能造成逆反。除了领域鲁棒性之外，在实际应用中部署说话人验证的一个主要问题是系统对欺诈性攻击的鲁棒性。这是由于说话人验证系统容易受到欺骗攻击。该方案的重点是构建具有域不变性和对欺骗攻击具有鲁棒性的说话人验证系统。为了解决领域不匹配问题，我们的目标是使用领域对抗训练来学习领域不变的说话人嵌入，以进行稳健的说话人验证。我们建议使用经过训练的深度学习架构来对说话者和领域进行分类。这种方法的关键观点是，虽然网络在分类说话人方面变得更好，但在分类领域方面却变得更差。因此，该网络产生了域不变的说话人表示。我们还提出了一些新的无监督域自适应方法来桥接源域和目标域。为了进一步提高性能，我们还考虑将这些方法与数据增强和无监督PLDA自适应方法相结合。最后，为了使ASV技术对欺诈性攻击具有鲁棒性，我们提出了一种盲自动检测欺骗攻击的方法，该方法不需要任何关于欺骗攻击类型的先验知识。本文提出了一种基于卷积神经网络的深度对抗欺骗的方法。该研究的新颖之处在于利用深度学习的优势进行域不变表示学习、域自适应和深度欺骗对抗。预期的结果将对说话人识别研究和商业社区产生一定的影响。