权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Robust Speaker Verification in Real Application Scenarios

真实应用场景中稳健的扬声器验证

基本信息

批准号：
RGPIN-2019-05381
负责人：
Alam, MdJahangir
金额：
$ 2.04万
依托单位：
Centre de recherche informatique de Montréal
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2019
资助国家：
加拿大
起止时间：
2019-01-01 至 2020-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=682754
关键词：
Robust Speaker Verification Real Application

项目摘要

Spoken language is the most natural way we human communicate with each other. There is rich information conveyed in speech signal including language information, speaker information, environmental information and so on. Speaker verification refers to the problem of verifying the identity of a person from his/her voice using the characteristic vocal information. The necessity to legitimate an individuals' identity arises in several situations, including access control and authorization of financial transactions.******Recent developments in speech-based technologies have led to speech being touted as becoming the primary means of communication between humans and technology in the future. As the use of speech becomes more ubiquitous, there is a need for improvement and innovation in voice-based verification technologies, specifically speaker verification, so that these methods work reliably in real world scenarios. ******To incorporate voice biometrics into real-world applications, it is important to ensure that verification performance can still be maintained even if the speakers are speaking in an adverse environment that the system has not confronted during training. Adverseness can be caused by background noise, reverberation, channel mismatch, language mismatch, and accent. Apart from domain robustness, a major concern with deploying speaker verification in real-world applications is the system's robustness against fraudulent attacks. This is due to the vulnerability of speaker verification systems to spoofing attacks.******This proposal focuses on building speaker verification systems which are domain-invariant and are robust to spoofing attacks. To tackle the domain mismatch problem, our goal is to learn domain-invariant speaker embeddings using domain adversarial training for robust speaker verification. We propose the use of deep learning architectures trained to both classify speakers and the domain. The key insight to this approach is that while network gets better at classifying speakers but gets worse at domain classification. As a result, the network leads to domain-invariant speaker representations. We also propose employing some novel unsupervised domain adaptation approaches to bridge the source and target domains. For improving performance further, we also consider combining these approaches with data augmentation and unsupervised PLDA adaptation methods.******Finally, to make ASV technology robust against fraudulent attacks, we propose a method for blind automatic detection of spoofing attacks which does not require any prior knowledge about the type of spoofing attacks. Here, convolution neural network based deep countermeasures are proposed for anti-spoofing.******Novelty of the proposed research includes to take the benefit of deep learning for domain-invariant representation learning, domain adaptation and deep spoofing countermeasures. The expected results will have a certain impact on speaker recognition research and commercial communities.**

口语是人类最自然的交流方式。语音信号中包含着丰富的信息，包括语言信息、说话人信息、环境信息等，说话人确认是指利用语音信号中的特征信息，从说话人的声音中确认说话人的身份。在几种情况下，包括对金融交易的访问控制和授权，都有必要使个人身份合法化。基于语音的技术的最新发展已经导致语音被吹捧为未来人类与技术之间的主要通信手段。随着语音的使用变得越来越普遍，需要改进和创新基于语音的验证技术，特别是说话人验证，以便这些方法在真实的世界场景中可靠地工作。** 为了将语音生物识别技术融入到现实世界的应用中，重要的是要确保即使说话者在系统在训练期间没有遇到的不利环境中说话，也仍然可以保持验证性能。背景噪声、混响、通道失配、语言失配和口音可能会导致不清晰。除了域鲁棒性之外，在现实世界应用中部署说话人验证的一个主要问题是系统对欺诈攻击的鲁棒性。这是由于说话人验证系统容易受到欺骗攻击。*该方案的重点是建立一个具有域不变性和对欺骗攻击具有鲁棒性的说话人确认系统。为了解决域失配问题，我们的目标是使用域对抗训练来学习域不变的说话人嵌入，以实现鲁棒的说话人验证。我们建议使用经过训练的深度学习架构来对说话者和领域进行分类。这种方法的关键见解是，虽然网络在分类说话人方面变得更好，但在领域分类方面变得更差。因此，该网络导致域不变的说话人表示。我们还建议采用一些新的无监督域自适应方法来桥接源域和目标域。为了进一步提高性能，我们还考虑将这些方法与数据增强和无监督PLDA自适应方法相结合。最后，为了使ASV技术对欺诈攻击具有鲁棒性，我们提出了一种欺骗攻击的盲自动检测方法，该方法不需要任何关于欺骗攻击类型的先验知识。在这里，提出了基于卷积神经网络的深度对策用于反欺骗。**所提出的研究的新奇之处包括利用深度学习进行域不变表示学习、域自适应和深度欺骗对策。预期的结果将对说话人识别研究和商业界产生一定的影响。