权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Improving audio-visual speech recognition with augmented facial-mapping.

通过增强面部映射改进视听语音识别。

基本信息

批准号：
1964209
负责人：
金额：
--
依托单位：
University of Southampton
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2017
资助国家：
英国
起止时间：
2017 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=studentship-1964209
关键词：
Improving audio visual speech recognition

项目摘要

Research questions:Can audio-visual speech recognition be improved through the augmentation of emerging facial mapping technology?Can the application of real-time 3D face mapping and sound compartmentalisation improve audio-visual speech recognition accuracy?Potential applications At the time of writing, no known research exists in the use of the TrueDepth camera's facial recognition for audio-visual speech recognition. This may be due to the infancy of the technology. The potential applications for an improved integrated audio-visual speech recognition system are: Improved human computer interaction for AI systems.A cheaper means of autonomous speech therapy.Language learning.Objectives and AimsThis research will focus on machine learning principles to develop a more effective end-to-end solution for speech and facial (visual speech) recognition algorithms. This will then be used to improve human accuracy and communication in these areas, through a precise feedback engine. The objective is to effectively integrate the use of the latest infrared and proximity sensors used for real-time face mapping, to improve audio-visual speech recognition.MethodologyAs this research is inherently interdisciplinary between computer science and linguistics this paper will first investigate current deep learning audio-visual speech recognition methodologies and broader historical speechreading and natural language processing techniques. This paper will then explore the individual accuracy of Apple's TrueDepth camera in terms of its potential application for visual speech recognition. The TrueDepth system is primarily used for facial recognition and animation, and is essentially the same technology contained within Microsoft's 3D tracking Connect accessory. This has since been miniaturised and improved by a middleware layer of machine learning software, to achieve the real-time mapping and articulation of 37 facial features with millimetre accurately. This research will first test the TrueDepth camera's recognition accuracy of a set visemes (visual phonemes) by recording a large native language learning dataset and iterating through a supervised deep learning algorithm. Once an acceptable level of viseme recognition accuracy is achieved, this will then be combined with an existing audio-based speech recognition engine. The final stage will assess whether the augmentation of the TruDepth camera system will result in a statistically viable improvement, when tested against standalone speech recognition engines.

研究问题：视听语音识别是否可以通过新兴的面部映射技术的增强来改进？实时3D人脸映射和声音划分的应用能否提高视听语音识别的准确性？在撰写本文时，还没有已知的研究将TrueDepth相机的面部识别用于视听语音识别。这可能是由于该技术的婴儿期。改进的集成视听语音识别系统的潜在应用是：改善人工智能系统的人机交互。自主语音治疗的更便宜的手段。语言学习。目标和目的本研究将专注于机器学习原理，为语音和面部（视觉语音）识别算法开发更有效的端到端解决方案。然后，通过精确的反馈引擎，这将用于提高这些领域的人类准确性和沟通。我们的目标是有效地整合使用最新的红外线和接近传感器用于实时人脸映射，以提高视听语音recognition.MethodologyAs这项研究本质上是计算机科学和语言学之间的跨学科本文将首先调查当前的深度学习视听语音识别方法和更广泛的历史语音阅读和自然语言处理技术。本文将探讨苹果TrueDepth摄像头在视觉语音识别方面的潜在应用。TrueDepth系统主要用于面部识别和动画，基本上与微软的3D跟踪连接配件中包含的技术相同。此后，机器学习软件的中间件层对这一点进行了改进和改进，以实现37个面部特征的实时映射和精确表达。这项研究将首先通过记录一个大型的母语学习数据集并通过监督式深度学习算法迭代来测试TrueDepth相机对一组视位（视觉音素）的识别准确性。一旦达到可接受的视位识别准确度水平，这将与现有的基于音频的语音识别引擎相结合。最后阶段将评估TruDepth相机系统的增强是否会在与独立语音识别引擎进行测试时带来统计上可行的改进。