权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Social Perceptions of Synthetic Speakers

合成扬声器的社会认知

基本信息

批准号：
423651352
负责人：
Professor Dr.-Ing. Sebastian Möller
金额：
--
依托单位：
Quality and Usability Lab
依托单位国家：
德国
项目类别：
Research Grants
财政年份：
2019
资助国家：
德国
起止时间：
2018-12-31 至 2021-12-31
项目状态：
已结题

来源：
https://gepris.dfg.de/gepris/projekt/423651352?language=en
关键词：
Social Perceptions Synthetic Speakers

项目摘要

Speech signals automatically induce social perceptions in listeners regarding the speakers. With acoustic analysis and signal manipulation, a great body of knowledge has been accumulated regarding relevant acoustic correlates of social perceptions, such as spectral and prosodic parameters, as well as perceptual dimensions for natural speech. However, despite the advent of modern speech synthesis paradigms providing very high quality, it is yet to be understood, if results from natural speech also hold for synthesized speech. Hence, the major research question is: “Which acoustic features of synthesized speech affect subjective perceptions of social speaker characteristics?”In order to answer this question, this project studies social perception of the two basic social attributions, competence and benevolence, for text-to-speech (TTS) synthesizers in two potential application domains: Stimuli from the topics of healthcare and of customer service. Results are compared to those obtained from natural speech in earlier projects. It is tested whether competence and benevolence also emerge as basic social attributions, or if other dimensions are more relevant. Regarding the speech signal, similarities and differences in acoustic parameters and their systematics are identified. A mid-term result is an acoustic prediction model of the identified social dimensions for synthesized speech.On a methodological level, utterances are created with state-of-the-art TTS systems and systematically modified on the signal level, in order to produce stimuli for empirical testing with human listeners. Crowd-sourcing techniques are applied for the required listening and rating tests. The final goal is to examine, how acoustic features and patterns can be directly incorporated in modern TTS methodologies (Hidden-Markov-Models, Deep Neural Networks) instead of post-processing signal manipulation. This leads to the secondary research question: “Which alterations of the synthesis procedure lead to positive perceptions of speakers?” For this aim, current approaches from speaker conversion are applied.Apart from the fundamental knowledge gained from this research, results will be relevant for TTS system developers, in order to efficiently improve voices for particular service domains.

语音信号自动地诱导听者对说话者的社会感知。随着声学分析和信号处理，已经积累了大量的知识，关于相关的声学相关的社会感知，如频谱和韵律参数，以及感知维度的自然语音。然而，尽管现代语音合成范例的出现提供了非常高的质量，它还有待理解，如果从自然语音的结果也适用于合成语音。因此，主要的研究问题是：“哪些合成语音的声学特征影响主观感知的社会扬声器的特点？”为了回答这个问题，本项目研究了两个基本的社会属性，能力和仁慈，文本到语音（TTS）合成器在两个潜在的应用领域：刺激的主题，医疗保健和客户服务的社会感知。结果进行了比较，从自然语音在早期的项目。测试是否能力和仁慈也出现作为基本的社会属性，或者如果其他方面更相关。关于语音信号，声学参数和它们的系统的相似性和差异被识别。中期的结果是一个声学预测模型的识别社会层面的合成speech.On方法层面上，话语创建与国家的最先进的TTS系统和系统的信号水平上的修改，以产生刺激与人类听众的实证测试。所需的收听和评级测试采用了众包技术。最终目标是研究声学特征和模式如何直接纳入现代TTS方法（隐马尔可夫模型，深度神经网络），而不是后处理信号操作。这就引出了第二个研究问题：“合成过程的哪些改变会导致对说话者的积极看法？”为了实现这一目标，目前的方法从扬声器转换applied.Apart从这项研究中获得的基础知识，结果将是相关的TTS系统开发人员，以有效地提高特定服务领域的语音。