权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Computational Methods for Speech Analysis

语音分析的计算方法

基本信息

批准号：
2120087
负责人：
Christopher Lucas
金额：
$ 24.93万
依托单位：
Washington University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-08-01 至 2024-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2120087&HistoricalAwards=false
关键词：
Computational Methods Speech Analysis

项目摘要

This research project will develop tools for testing hypotheses about human communication. Researchers generally study human communication from textual transcripts which omit vocal tone. The project will directly address the disconnect between the data-generating process - in which speakers and listeners use the auditory channel to convey both textual and non-textual signals - and the widespread practice of discarding speech audio. The investigators will extend their prior speech model, The Model of Audio and Speech Structure, to address some limitations of the model. In particular, the statistical extensions will accommodate multiple speakers and allow for the joint modeling of text and tone. To demonstrate the value of the statistical extensions, the model will be applied to two original video corpora - police body-worn camera footage and campaign speeches for federal office. New software will be developed that makes it easy for researchers to quickly annotate a large amount of speech audio. The browser-based tools will enable automatic and manual segmentation, along with labeling. Multiple graduate students will gain experience in computationally intensive research and software development. The tools to be developed will be incorporated into ongoing public-private collaborations to improve oversight of police officers in the field.This research project will extend the Model of Audio and Speech Structure (MASS), which analyzes conversation as a nested stochastic process in which (i) the flow of conversation unfolds as a sequence of utterances transitioning between speakers and their vocal tones, based on contextual covariates; and (ii) the auditory signal within each utterance unfolds as a hidden Markov model that transitions between phonemes which generate sound. The model enables social scientists to test hypotheses about how conversations are structured by fixed covariates (e.g., speaker gender, conversation role) and time-varying covariates (e.g., exogenous external stimuli, endogenous conversation trajectory such as the previous speaker's tone). In its current implementation, however, MASS has two key limitations: First, it uses resource-intensive human annotations of tone for each speaker, which limits application to contexts with many unique speakers, such as police body-worn camera footage. This project will develop extensions allowing the model to borrow strength by partial pooling across speakers with similar speech profiles. Second, MASS incorporates text as externally given metadata. The project will develop a new approach for joint modeling of text and audio which will incorporate a dynamic topic model into the flow-of-conversation layer of MASS. The investigators will conduct two applications to demonstrate the value of the multi-speaker and joint text-audio modeling extensions.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

这项研究项目将开发工具来测试关于人类交流的假设。研究人员通常从文本文本中研究人类交流，而文本文本省略了声调。该项目将直接解决数据生成过程与丢弃语音音频的普遍做法之间的脱节。在数据生成过程中，说话者和听者使用听觉通道来传递文本和非文本信号。研究人员将扩展他们之前的语音模型，音频和语音结构模型，以解决该模型的一些局限性。特别是，统计扩展将容纳多个发言者，并允许对文本和语调进行联合建模。为了证明统计扩展的价值，该模型将应用于两个原始视频语料库-警察穿着身体的摄像机镜头和联邦办公室的竞选演讲。将开发新的软件，使研究人员能够轻松地对大量语音音频进行快速注释。基于浏览器的工具将支持自动和手动分割，以及标签。多名研究生将在计算密集型研究和软件开发方面获得经验。将开发的工具将被纳入正在进行的公私合作中，以改善对现场警察的监督。本研究项目将扩展音频和语音结构模型(MASS)，该模型将会话分析为一个嵌套的随机过程，其中(I)对话流展开为基于上下文协变量在说话者和他们的声调之间转换的一系列话语；(Ii)每个话语中的听觉信号展开为一个隐藏的马尔可夫模型，该模型在产生声音的音素之间转换。该模型使社会科学家能够测试关于对话是如何由固定的协变量(例如，说话人的性别、对话角色)和时变的协变量(例如，外部刺激、内生的对话轨迹，如前一说话人的语气)构成的假设。然而，在目前的实现中，MASS有两个关键限制：首先，它为每个说话者使用资源密集型的人类语气注释，这将应用程序限制在许多独特说话者的上下文中，例如警察佩戴的摄像机镜头。该项目将开发扩展，允许该模型通过部分汇集具有相似语音特征的说话者来借力。其次，MASS将文本合并为外部给定的元数据。该项目将开发一种新的文本和音频联合建模方法，将动态主题模型结合到质量的对话流层中。调查人员将进行两项申请，以证明多发言者和联合文本-音频建模扩展的价值。该奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。