Person-specific automatic speaker recognition: understanding the behaviour of individual speakers for applications of ASR
特定于人的自动说话人识别:了解单个说话人的行为以用于 ASR 的应用
基本信息
- 批准号:ES/W001241/1
- 负责人:
- 金额:$ 103.22万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2022
- 资助国家:英国
- 起止时间:2022 至 无数据
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Automatic speaker recognition (ASR) software processes and analyses speech to make decisions about whether two voices belong to the same or different individuals. Such technology is becoming an increasingly important part of our lives; used as a security measure when gaining access to personal accounts (e.g. banks), or as a means of tailoring content to a specific person on smart devices. Around the world, ASR systems are commonly used for investigative and forensic purposes, to analyse recordings of criminal voices where identity is unknown. Yet systems perform better or worse with certain voices. Therefore, a fundamental question remains: what makes a particular voice easy or difficult for ASR to recognise?State-of-the-art systems, using techniques from artificial intelligence (AI), have shown marked improvements in performance compared with older approaches. However, there remain issues. Firstly, ASR research has focused on minimising the effects of well-known technical factors, such as channel (e.g. mobile vs. landline telephone), recording quality and microphones. In resolving these technical challenges, large improvements in systems have been achieved. Yet little is known about how speakers themselves affect ASR performance. Secondly, ASR research has been interested in reducing overall error rates. Yet, in the real-world (where innocence and guilt may be at stake), the key question is: what is the chance the system has made an error in this specific instance? Finally, while AI approaches have undoubtedly brought improvements in overall performance, such algorithms make it more difficult to know what information systems are relying on to make decisions. This is problematic for forensic experts, who must explain their methods to non-expert end users, such as judges, juries, lawyers and police.This project is the first to systematically assess how individual speakers perform within and across ASR systems and to compare speaker effects, in terms of linguistic properties of voices or speaker demographics (e.g. accent, ethnicity, gender), with well-studied technical effects. The aim is to use this knowledge to improve ASR systems by flagging potentially problematic speakers and to develop methods to handle these problematic speakers. We will use novel, interdisciplinary methods, bringing together expertise from speech technology, linguistics, and forensic speech science. Our collaboration with commercial ASR vendor Oxford Wave Research allows us to adapt and change systems to assess the effects on results for individual speakers. We will also use highly controlled, small-scale experiments to assess speaker effects in isolation, as well as using much larger datasets of more forensically realistic recordings, provided by our project partners, the UK Ministry of Defence and the Netherlands Forensic Institute. The availability of a variety of datasets also allows us to assess the generalisability of results across a range of voices. This project is entirely driven by real-world issues and so the results will deliver considerable impact to a wide range of stakeholders. By understanding more about individuals, our results have the capability to improve overall ASR performance. This will be of benefit to users and developers of ASR systems. The results will also have specific implications for forensic and investigative applications, guiding data collection for validating methods (something which experts are under increasing regulatory pressure to do) and provide a framework for combining ASR and linguistic analysis. In doing so, through engagement with the legal community, we aim to affect a change in the status of ASR in England and Wales, such that it is admissible as expert evidence. We will deliver impact via knowledge exchange with a Forensic Advisory Panel consisting of representatives from forensic speech science, law enforcement, and the legal community.
自动说话人识别(ASR)软件处理和分析语音,以判断两个声音是否属于同一个人或不同的人。这种技术正在成为我们生活中越来越重要的一部分;在访问个人账户(例如银行)时用作安全措施,或者作为在智能设备上为特定人员定制内容的手段。在世界各地,ASR系统通常用于调查和法医目的,以分析身份不明的犯罪声音录音。然而,系统在某些声音下表现得更好或更差。因此,一个基本的问题仍然存在:是什么让特定的声音容易或难以识别?与旧方法相比,使用人工智能(AI)技术的最先进的系统在性能上有了显着的改进。然而,仍然存在一些问题。首先,ASR研究的重点是尽量减少众所周知的技术因素的影响,如信道(例如移动的与固定电话),录音质量和麦克风。在解决这些技术挑战的过程中,已经实现了系统的巨大改进。然而,很少有人知道扬声器本身如何影响ASR性能。其次,ASR研究一直对降低总体错误率感兴趣。然而,在现实世界中(无辜和有罪可能处于危险之中),关键问题是:系统在这个特定情况下犯错误的几率有多大?最后,虽然人工智能方法无疑带来了整体性能的提高,但这些算法使得人们更难知道信息系统依赖什么来做出决策。这对法医专家来说是个问题,因为他们必须向非专业的最终用户(如法官、陪审团、律师和警察)解释他们的方法。该项目首次系统地评估了个体说话者在ASR系统内和系统间的表现,并根据语音的语言特性或说话者的人口统计特征(如口音、种族、性别)比较了说话者的效果,并充分研究了技术效果。我们的目的是利用这些知识,以改善ASR系统标记潜在的问题扬声器,并制定方法来处理这些问题的扬声器。我们将使用新颖的跨学科方法,汇集来自语音技术,语言学和法医语音科学的专业知识。我们与商业ASR供应商Oxford Wave Research的合作使我们能够调整和更改系统,以评估对单个扬声器结果的影响。我们还将使用高度受控的小规模实验来评估孤立的扬声器效果,以及使用由我们的项目合作伙伴英国国防部和荷兰法医研究所提供的更大的法医学真实录音数据集。各种数据集的可用性也使我们能够评估结果在一系列声音中的普遍性。该项目完全由现实世界的问题驱动,因此其结果将对广泛的利益相关者产生重大影响。通过更多地了解个人,我们的结果有能力提高整体ASR性能。这将有利于ASR系统的用户和开发人员。研究结果还将对法医和调查应用产生具体影响,指导数据收集以验证方法(专家们面临越来越大的监管压力),并为结合ASR和语言分析提供框架。在这样做的过程中,通过与法律的社区的接触,我们的目标是改变ASR在英格兰和威尔士的地位,使其可作为专家证据。我们将通过与法医咨询小组的知识交流产生影响,该小组由法医语言科学,执法和法律的社区的代表组成。
项目成果
期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Reducing uncertainty at the score-to-LR stage in likelihood ratio-based forensic voice comparison using automatic speaker recognition systems
使用自动说话人识别系统减少基于似然比的法医语音比较中分数到 LR 阶段的不确定性
- DOI:10.21437/interspeech.2022-518
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Wang B
- 通讯作者:Wang B
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Vincent Hughes其他文献
Voice Quality of Hesitations: Acoustic Measures and VPA ratings
犹豫的语音质量:声学测量和 VPA 评级
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Amanda Cardoso;P. Foulkes;Peter French;Philip Harrison;Vincent Hughes;C. Kavanagh;Eugenia San Segundo - 通讯作者:
Eugenia San Segundo
Establishing typicality: A closer look at individual formants
建立典型性:仔细观察各个共振峰
- DOI:
10.1121/1.4805428 - 发表时间:
2013 - 期刊:
- 影响因子:2.4
- 作者:
Vincent Hughes - 通讯作者:
Vincent Hughes
Sharing innovative methods, data and knowledge across sociophonetics and forensic speech science
分享社会语音学和法医语音科学的创新方法、数据和知识
- DOI:
10.1515/lingvan-2018-0062 - 发表时间:
2020 - 期刊:
- 影响因子:1.1
- 作者:
Vincent Hughes;Jessica Wormald - 通讯作者:
Jessica Wormald
Speaker-specificity in speech production: The contribution of source and filter
语音生成中的说话人特异性:源和滤波器的贡献
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Vincent Hughes;Amanda Cardoso;P. Foulkes;Peter French;A. Gully;Philip Harrison - 通讯作者:
Philip Harrison
Regional variation and the definition of the relevant population in likelihood ratio-based forensic voice compari son using cepstral coefficients
使用倒谱系数进行基于似然比的法医语音比较中的区域变化和相关群体的定义
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
Vincent Hughes;P. Foulkes - 通讯作者:
P. Foulkes
Vincent Hughes的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Vincent Hughes', 18)}}的其他基金
Humans and machines: novel methods for testing speaker recognition performance
人类和机器:测试说话人识别性能的新方法
- 批准号:
AH/T012978/1 - 财政年份:2021
- 资助金额:
$ 103.22万 - 项目类别:
Research Grant
相似国自然基金
新生儿坏死性小肠结肠炎中去泛素化酶USP15调控ILC3分化损伤肠道粘膜屏障的致病机制研究
- 批准号:82371711
- 批准年份:2023
- 资助金额:49.00 万元
- 项目类别:面上项目
花胶鱼类物种Species-specific PCR和Multiplex PCR鉴定体系研究
- 批准号:31902373
- 批准年份:2019
- 资助金额:23.0 万元
- 项目类别:青年科学基金项目
Dravet综合征基因突变分析及突变来源研究
- 批准号:81171221
- 批准年份:2011
- 资助金额:58.0 万元
- 项目类别:面上项目
睾丸特异性新基因TSC29的表达调控机制及其功能研究
- 批准号:81170613
- 批准年份:2011
- 资助金额:54.0 万元
- 项目类别:面上项目
RNA结合蛋白CUG-BP1对于mRNA降解的调控机制研究
- 批准号:31000570
- 批准年份:2010
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
新生隐球菌减数分裂特异性基因ISC10的生理功能研究
- 批准号:30970130
- 批准年份:2009
- 资助金额:30.0 万元
- 项目类别:面上项目
寻找精神分裂症的调节性遗传变异
- 批准号:30870899
- 批准年份:2008
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Automatic Discovery of Domain Specific Languages
自动发现领域特定语言
- 批准号:
568824-2022 - 财政年份:2022
- 资助金额:
$ 103.22万 - 项目类别:
Postgraduate Scholarships - Doctoral
Automatic treatment planning based on patient-specific dose distribution using deep learning
使用深度学习基于患者特定剂量分布的自动治疗计划
- 批准号:
20K16742 - 财政年份:2020
- 资助金额:
$ 103.22万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Automatic exposure control (AEC) for CT based on neural network-driven patient-specific real-time assessment of dose distributions and minimization of the effective dose
基于神经网络驱动的患者特异性剂量分布实时评估和有效剂量最小化的 CT 自动曝光控制 (AEC)
- 批准号:
428660931 - 财政年份:2019
- 资助金额:
$ 103.22万 - 项目类别:
Research Grants
Mount Sinai HHEAR Network Untargeted Lab Hub
西奈山 HHEAR 网络非目标实验室中心
- 批准号:
10675524 - 财政年份:2019
- 资助金额:
$ 103.22万 - 项目类别:
Machine learning modeling of automatic detection of failure modes in IMRT patient-specific QA
IMRT 患者特定 QA 中故障模式自动检测的机器学习模型
- 批准号:
16K19226 - 财政年份:2016
- 资助金额:
$ 103.22万 - 项目类别:
Grant-in-Aid for Young Scientists (B)
Automatic Voice-Based Assessment of Language Abilities
基于语音的语言能力自动评估
- 批准号:
9191358 - 财政年份:2015
- 资助金额:
$ 103.22万 - 项目类别:
Project IMPACT: In-the-Moment Protection from Automatic Capture by Triggers
项目影响:通过触发器自动捕获的即时保护
- 批准号:
9203038 - 财政年份:2015
- 资助金额:
$ 103.22万 - 项目类别:
Mount Sinai HHEAR Network Targeted Lab Hub
西奈山 HHEAR 网络目标实验室中心
- 批准号:
10660986 - 财政年份:2015
- 资助金额:
$ 103.22万 - 项目类别:
A Topic Model and Visualization for Automatic Summarization of Patient Records
用于自动汇总患者记录的主题模型和可视化
- 批准号:
8919947 - 财政年份:2014
- 资助金额:
$ 103.22万 - 项目类别: