权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Towards linguistically-informed automatic speaker recognition

迈向基于语言的自动说话人识别

基本信息

批准号：
2279775
负责人：
金额：
--
依托单位：
University of York
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2019
资助国家：
英国
起止时间：
2019 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2279775
关键词：
Towards linguistically informed automatic speaker

项目摘要

This project will investigate how Automatic Speaker Recognition (ASR) systems work and how they can be improved. ASR systems recognise speakers from just their voice and are commonly used by banks and government institutions such as the HMRC. Such systems have seen improvements in recent years due to continual refinement of methods and the availability of large databases of recordings to test them. State-of-the-art systems now produce few errors even with poor quality and short recordings. However, ASR systems are a 'black box': we know that they are analysing a speaker's voice but we do not know what linguistic information they rely on to make their decisions. This project is exciting because it builds on a small but growing body of research at the intersection of linguistics and speech technology. It is also the first systematic study to investigate the inner workings of such systems and the outcomes will be beneficial to society by improving the reliability of security systems and speaker recognition systems used by banks and courtrooms. I am interested in this project because I have experience working with similar systems which recognise individuals using written data. However, those systems are not 'black boxes' like these ASR systems which use spoken data. Thus, I am driven to understand the 'black box' of ASR systems to ensure that the systems which use written and spoken data are equally reliable. Understanding this 'black box' is crucial because it will allow us to improve ASR systems further, particularly through understanding what 'types' of voices are difficult. They must also be explainable to lay people, e.g. jury members, in legal cases where such systems are used in evidence. The project will ask three research questions devoted to understanding and improving ASR systems:RQ.1: To what extent do ASR systems capture tangible linguistic properties of a voice?Firstly, we will investigate what linguistic properties of voice map onto the abstract properties of voice which ASR systems already detect. I hypothesise that many properties will be pertinent, e.g. vowel formants which are the regular and consistent resonating frequencies of different vowel sounds that are uniquely shaped by each speaker's vocal tract and accent. RQ.2: Can we predict which speakers will be problematic for the system?Secondly, we will identify groups who may be problematic for ASR systems so that we can improve the systems based on why these groups pose issues. Some accents have less vowel variation than others; as a result, their speakers could be at greater risk of misrecognition as someone with the same accent because there are less variables to identify the speaker uniquely. RQ.3: Can linguistic information be used to improve the performance of ASR?Finally, we will use linguistic speech analysis to improve ASR systems. By identifying the linguistic features which are used by ASR systems, we can tailor ASR systems to focus on these features to improve their reliability. This project uses a state-of-the-art speaker recognition system (VoiSentry) developed by the commercial partner, Aculab. My methodology will involve testing the VoiSentry software on voices that have been manipulated in controlled ways, e.g. changing the acoustic properties of the vowel sounds, and seeing how it affects the end score. If it does, we will know that ASR systems capture tangible linguistic properties of voice and we can therefore tailor these systems to focus on detecting these features. Aculab will be influential to this study because they will allow us to examine the underlying computer code which no other ASR system will permit. Thus, we can do specific manipulations to test changes to the outcome result. Overall, this research will have societal value because it will ensure that speaker recognition systems used by banks and government institutions are as reliable as possible.

本项目将研究自动说话人识别（ASR）系统如何工作以及如何改进。ASR系统仅通过声音识别说话者，通常被银行和HMRC等政府机构使用。近年来，由于方法的不断改进和大型录音数据库的可用性，这些系统已经得到了改进。最先进的系统现在产生的错误很少，即使质量差和短的录音。然而，ASR系统是一个“黑匣子”：我们知道它们正在分析说话者的声音，但我们不知道它们依赖什么语言信息来做出决定。这个项目是令人兴奋的，因为它建立在语言学和语音技术交叉点的一个小但不断增长的研究基础上。这也是第一次系统地研究这种系统的内部工作原理，其结果将通过提高银行和法庭使用的安全系统和说话人识别系统的可靠性而有益于社会。我对这个项目很感兴趣，因为我有类似的系统工作经验，这些系统可以识别使用书面数据的个人。然而，这些系统不是像这些使用语音数据的ASR系统那样的“黑匣子”。因此，我被驱使去理解ASR系统的“黑匣子”，以确保使用书面和口头数据的系统同样可靠。了解这个“黑匣子”至关重要，因为它将使我们能够进一步改进ASR系统，特别是通过了解哪些“类型”的声音是困难的。在法律的案件中，当这些系统被用作证据时，它们还必须能够向非专业人员，例如陪审团成员解释。该项目将提出三个研究问题，致力于理解和改善ASR系统：RQ.1：在多大程度上做ASR系统捕获语音的有形语言特性？首先，我们将研究语音的语言属性映射到ASR系统已经检测到的语音的抽象属性。我假设，许多属性将是相关的，例如元音共振峰是由每个扬声器的声道和口音独特塑造的不同元音声音的规则和一致的共振频率。RQ.2：我们能预测哪些扬声器会对系统造成问题吗？其次，我们将确定可能对ASR系统造成问题的群体，以便我们可以根据这些群体造成问题的原因改进系统。有些口音的元音变化比其他口音少;因此，他们的说话者可能会被误认为具有相同口音的人，因为只有较少的变量来识别说话者。RQ.3：语言信息可以用来提高ASR的性能吗？最后，我们将使用语言学语音分析来改进ASR系统。通过识别ASR系统所使用的语言特征，我们可以定制ASR系统来关注这些特征，以提高其可靠性。该项目使用了由商业合作伙伴Aculab开发的最先进的说话人识别系统（VoiSentry）。我的方法将涉及测试VoiSentry软件的声音已经被操纵在控制的方式，例如，改变元音声音的声学特性，并看看它如何影响最终得分。如果是这样，我们将知道ASR系统捕获了语音的有形语言特性，因此我们可以定制这些系统来专注于检测这些特征。Aculab将对这项研究产生影响，因为它们将允许我们检查其他ASR系统不允许的底层计算机代码。因此，我们可以进行特定的操作来测试结果结果的变化。总的来说，这项研究将具有社会价值，因为它将确保银行和政府机构使用的说话人识别系统尽可能可靠。