权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Person-specific automatic speaker recognition: understanding the behaviour of individual speakers for applications of ASR

特定于人的自动说话人识别：了解单个说话人的行为以用于 ASR 的应用

基本信息

批准号：
ES/W001241/1
负责人：
Vincent Hughes
金额：
$ 103.22万
依托单位：
University of York
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2022
资助国家：
英国
起止时间：
2022 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=ES%2FW001241%2F1
关键词：
Person specific automatic speaker recognition

项目摘要

Automatic speaker recognition (ASR) software processes and analyses speech to make decisions about whether two voices belong to the same or different individuals. Such technology is becoming an increasingly important part of our lives; used as a security measure when gaining access to personal accounts (e.g. banks), or as a means of tailoring content to a specific person on smart devices. Around the world, ASR systems are commonly used for investigative and forensic purposes, to analyse recordings of criminal voices where identity is unknown. Yet systems perform better or worse with certain voices. Therefore, a fundamental question remains: what makes a particular voice easy or difficult for ASR to recognise?State-of-the-art systems, using techniques from artificial intelligence (AI), have shown marked improvements in performance compared with older approaches. However, there remain issues. Firstly, ASR research has focused on minimising the effects of well-known technical factors, such as channel (e.g. mobile vs. landline telephone), recording quality and microphones. In resolving these technical challenges, large improvements in systems have been achieved. Yet little is known about how speakers themselves affect ASR performance. Secondly, ASR research has been interested in reducing overall error rates. Yet, in the real-world (where innocence and guilt may be at stake), the key question is: what is the chance the system has made an error in this specific instance? Finally, while AI approaches have undoubtedly brought improvements in overall performance, such algorithms make it more difficult to know what information systems are relying on to make decisions. This is problematic for forensic experts, who must explain their methods to non-expert end users, such as judges, juries, lawyers and police.This project is the first to systematically assess how individual speakers perform within and across ASR systems and to compare speaker effects, in terms of linguistic properties of voices or speaker demographics (e.g. accent, ethnicity, gender), with well-studied technical effects. The aim is to use this knowledge to improve ASR systems by flagging potentially problematic speakers and to develop methods to handle these problematic speakers. We will use novel, interdisciplinary methods, bringing together expertise from speech technology, linguistics, and forensic speech science. Our collaboration with commercial ASR vendor Oxford Wave Research allows us to adapt and change systems to assess the effects on results for individual speakers. We will also use highly controlled, small-scale experiments to assess speaker effects in isolation, as well as using much larger datasets of more forensically realistic recordings, provided by our project partners, the UK Ministry of Defence and the Netherlands Forensic Institute. The availability of a variety of datasets also allows us to assess the generalisability of results across a range of voices. This project is entirely driven by real-world issues and so the results will deliver considerable impact to a wide range of stakeholders. By understanding more about individuals, our results have the capability to improve overall ASR performance. This will be of benefit to users and developers of ASR systems. The results will also have specific implications for forensic and investigative applications, guiding data collection for validating methods (something which experts are under increasing regulatory pressure to do) and provide a framework for combining ASR and linguistic analysis. In doing so, through engagement with the legal community, we aim to affect a change in the status of ASR in England and Wales, such that it is admissible as expert evidence. We will deliver impact via knowledge exchange with a Forensic Advisory Panel consisting of representatives from forensic speech science, law enforcement, and the legal community.

自动说话人识别（ASR）软件处理和分析语音，以判断两个声音是否属于同一个人或不同的人。这种技术正在成为我们生活中越来越重要的一部分;在访问个人账户（例如银行）时用作安全措施，或者作为在智能设备上为特定人员定制内容的手段。在世界各地，ASR系统通常用于调查和法医目的，以分析身份不明的犯罪声音录音。然而，系统在某些声音下表现得更好或更差。因此，一个基本的问题仍然存在：是什么让特定的声音容易或难以识别？与旧方法相比，使用人工智能（AI）技术的最先进的系统在性能上有了显着的改进。然而，仍然存在一些问题。首先，ASR研究的重点是尽量减少众所周知的技术因素的影响，如信道（例如移动的与固定电话），录音质量和麦克风。在解决这些技术挑战的过程中，已经实现了系统的巨大改进。然而，很少有人知道扬声器本身如何影响ASR性能。其次，ASR研究一直对降低总体错误率感兴趣。然而，在现实世界中（无辜和有罪可能处于危险之中），关键问题是：系统在这个特定情况下犯错误的几率有多大？最后，虽然人工智能方法无疑带来了整体性能的提高，但这些算法使得人们更难知道信息系统依赖什么来做出决策。这对法医专家来说是个问题，因为他们必须向非专业的最终用户（如法官、陪审团、律师和警察）解释他们的方法。该项目首次系统地评估了个体说话者在ASR系统内和系统间的表现，并根据语音的语言特性或说话者的人口统计特征（如口音、种族、性别）比较了说话者的效果，并充分研究了技术效果。我们的目的是利用这些知识，以改善ASR系统标记潜在的问题扬声器，并制定方法来处理这些问题的扬声器。我们将使用新颖的跨学科方法，汇集来自语音技术，语言学和法医语音科学的专业知识。我们与商业ASR供应商Oxford Wave Research的合作使我们能够调整和更改系统，以评估对单个扬声器结果的影响。我们还将使用高度受控的小规模实验来评估孤立的扬声器效果，以及使用由我们的项目合作伙伴英国国防部和荷兰法医研究所提供的更大的法医学真实录音数据集。各种数据集的可用性也使我们能够评估结果在一系列声音中的普遍性。该项目完全由现实世界的问题驱动，因此其结果将对广泛的利益相关者产生重大影响。通过更多地了解个人，我们的结果有能力提高整体ASR性能。这将有利于ASR系统的用户和开发人员。研究结果还将对法医和调查应用产生具体影响，指导数据收集以验证方法（专家们面临越来越大的监管压力），并为结合ASR和语言分析提供框架。在这样做的过程中，通过与法律的社区的接触，我们的目标是改变ASR在英格兰和威尔士的地位，使其可作为专家证据。我们将通过与法医咨询小组的知识交流产生影响，该小组由法医语言科学，执法和法律的社区的代表组成。

项目成果

期刊论文数量（2）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Reducing uncertainty at the score-to-LR stage in likelihood ratio-based forensic voice comparison using automatic speaker recognition systems

使用自动说话人识别系统减少基于似然比的法医语音比较中分数到 LR 阶段的不确定性

DOI：
10.21437/interspeech.2022-518
发表时间：
2022
期刊：
影响因子：
0
作者：
Wang B
通讯作者：
Wang B

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Vincent Hughes其他文献

Voice Quality of Hesitations: Acoustic Measures and VPA ratings

犹豫的语音质量：声学测量和 VPA 评级

DOI：
发表时间：
2018
期刊：
影响因子：
0
作者：
Amanda Cardoso;P. Foulkes;Peter French;Philip Harrison;Vincent Hughes;C. Kavanagh;Eugenia San Segundo
通讯作者：
Eugenia San Segundo