权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Inclusive, Private Mobile Input and Interaction Using Lip Reading

职业：使用唇读进行包容性、私密的移动输入和交互

基本信息

批准号：
2239633
负责人：
Ahmed Arif
金额：
$ 63.63万
依托单位：
University of California - Merced
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-04-15 至 2028-03-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2239633&HistoricalAwards=false
关键词：
CAREER Inclusive Private Mobile Input

项目摘要

Speech and whisper input on mobile devices can offer fast and seamless hands-free input and interaction to a wide range of users, including people with low vision and blindness. But there are many scenarios where speech and whisper are not viable due to ambient noise or because of privacy and security concerns, or even simply not to disturb other people. A system that understands speech by visually interpreting lip movements, known as image-based lip reading or silent speech, can mitigate many of these challenges. However, silent speech recognition systems are typically slower and more error prone than common speech recognition models, and they may require hardware that is impractical in real-world scenarios. Hence to date this approach has not been investigated as a serious alternative mode of interaction on mobile devices, and it is unknown how best to design the user interface for silent speech or the types of feedback that can enhance its usability. Silent speech has also not been well studied with people without sight. This research will develop an efficient real-time lip reader that uses the front camera of a mobile device to capture the motion of the lips and interprets that into text. A particular focus is on the design of an intuitive user interface that provides a range of visual, auditory, and tactile feedback to facilitate error free text entry. Even broader impacts will derive from providing access to mobile devices to a wider range of users, such as persons with speech disorders or who are mute. Ultimately, project outcomes could be exploited in virtual reality, automotive user interfaces, and many other systems to increase their usability, privacy, security and accessibility. The real-time lip reader will slice and overlap live video feeds from a mobile camera to recognize one phoneme at a time as the user silently speaks by using a deep 3D convolutional neural network (3D-CNN), a recurrent network, and the connectionist temporal classification loss. It will be augmented with a refiner channel that will detect, auto-correct and provide feedback on both character and word-level errors using deep denoising autoencoder (DDA) and custom language models. A range of auditory and tactile feedback will be developed to facilitate error free input and uninterrupted camera view for people with low vision and blindness. The project will also develop multi-modal error correction approaches by exploiting speech, silent speech, and touch interactions. Finally, it will build a silent speech recognition API for the design and development of accessible mobile input and interaction techniques.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

移动的设备上的语音和耳语输入可以为广泛的用户（包括视力低下和失明的人）提供快速和无缝的免提输入和交互。但在许多情况下，由于环境噪音或隐私和安全问题，甚至只是为了不打扰其他人，语音和耳语是不可行的。通过视觉上解释嘴唇运动来理解语音的系统，称为基于图像的唇阅读或无声语音，可以减轻许多这些挑战。然而，无声语音识别系统通常比常见的语音识别模型更慢并且更容易出错，并且它们可能需要在现实世界场景中不切实际的硬件。因此，到目前为止，这种方法还没有被调查作为一个严重的替代模式的互动移动的设备，它是未知的如何最好地设计用户界面的无声语音或类型的反馈，可以提高其可用性。无声语言也没有被很好地研究过。本研究将开发一种高效的实时唇读器，该唇读器使用移动终端的前置摄像头来捕获嘴唇的运动并将其解释为文本。一个特别的重点是设计一个直观的用户界面，提供了一系列的视觉，听觉和触觉反馈，以促进无错误的文本输入。向更广泛的用户提供移动的设备，如语言障碍者或哑巴，将产生更广泛的影响。最终，项目成果可以在虚拟现实、汽车用户界面和许多其他系统中得到利用，以提高其可用性、隐私性、安全性和可访问性。实时唇读器将切片和重叠来自移动的摄像头的实时视频馈送，以通过使用深度3D卷积神经网络（3D-CNN）、递归网络和连接主义时间分类损失来在用户无声地说话时一次识别一个音素。它将增加一个细化通道，该通道将使用深度去噪自动编码器（DDA）和自定义语言模型检测，自动纠正并提供字符和单词级错误的反馈。将开发一系列听觉和触觉反馈，以促进低视力和失明人士的无错误输入和不间断的摄像机视图。该项目还将通过利用语音、无声语音和触摸交互来开发多模式纠错方法。最后，它将建立一个无声的语音识别API，用于设计和开发无障碍的移动的输入和交互技术。该奖项反映了NSF的法定使命，并已被认为是值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估的支持。