权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: RI: Medium: Flexible Deep Speech Synthesis through Gestural Modeling

合作研究：RI：Medium：通过手势建模进行灵活的深度语音合成

基本信息

批准号：
2106928
负责人：
Gopala Krishna Anumanchipalli
金额：
$ 40万
依托单位：
University of California-Berkeley
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-10-01 至 2024-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2106928&HistoricalAwards=false
关键词：
Collaborative Research RI Medium Flexible

项目摘要

Voice based interactions have become the norm everywhere from cars, to mobile phones to digital home assistants. As speech based machine interaction becomes more pervasive, there is increased demand and expectation of human-like performance and personality from these systems. It is important for the machine to deliver responses about the weather on a pleasant sunny day or an impending hurricane in an appropriate manner. Machines need to be able to respond sympathetically or emphatically depending on the context of their use. Critically, when machines fail, they should do so in human understandable ways, so that there are no unintended consequences of technology. This project aims to create more natural and flexible speech synthesis technology that is inspired by human strategies and mechanisms for speech production. Bringing together the science of speech production and current state-of-the-art engineering speech systems, this project aims to impart explainability, naturalness and flexibility to speech technologies. This project has the potential to impact all systems that use speech output like automated tutoring, interactive voice response, speech translation in commercial and military settings, digital assistants, robotics and rehabilitative healthcare applications like Brain-Computer Interfaces. Current speech synthesis techniques are focused on end-to-end systems, avoiding explicit modeling of internal structure of the speech signal. Consequently, such systems may have good results but fail to allow any generalization beyond their recorded databases. This project concentrates on incorporating aspects of human speech production into computer speech synthesis. Using data-driven techniques and vocal tract imaging datasets, the project aims to discover and model compositional aspects of the speech signal as described by Articulatory Phonology. Novel deep-learning based approaches will be developed for joint optimization of diverse speech representations such as acoustic, phonological and physiological data within an analysis-by-synthesis framework. New strategies will be developed for incorporating grounded representations into text-to-speech training and evaluated in a range of applications in flexible speech synthesis.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

从汽车、手机到数字家庭助理，基于语音的互动已经成为世界各地的常态。随着基于语音的机器交互变得越来越普遍，这些系统对类似人类的性能和个性的需求和期望也越来越高。对于机器来说，以适当的方式提供关于晴朗天气或即将到来的飓风的天气响应是很重要的。机器需要能够根据其使用环境做出同情或强调的反应。关键是，当机器发生故障时，它们应该以人类可以理解的方式做到这一点，这样就不会有技术带来的意外后果。该项目旨在创造更自然、更灵活的语音合成技术，该技术受到人类语音产生策略和机制的启发。该项目结合了语音产生的科学和当前最先进的工程语音系统，旨在赋予语音技术可解释性、自然性和灵活性。该项目有可能影响所有使用语音输出的系统，如自动辅导、交互式语音响应、商业和军事环境中的语音翻译、数字助理、机器人和康复医疗应用程序，如脑机接口。当前的语音合成技术集中于端到端系统，避免对语音信号的内部结构进行显式建模。因此，这种系统可能会有良好的结果，但不能在其记录的数据库之外进行任何概括。该项目致力于将人类语音产生的各个方面融入到计算机语音合成中。使用数据驱动技术和声道成像数据集，该项目旨在发现语音信号的组成方面并对其进行建模，如节律音系学所述。将开发新的基于深度学习的方法，用于在按合成分析的框架内联合优化不同的语音表示，例如声学、语音和生理数据。将开发新的策略，将扎根的表达纳入文本到语音的培训中，并在灵活的语音合成中的一系列应用中进行评估。该奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。