权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Eye Gaze in Salience Modeling for Robust Spoken Language Understanding

用于鲁棒口语理解的显着性建模中的眼睛注视

基本信息

批准号：
0535112
负责人：
Joyce Chai
金额：
--
依托单位：
Michigan State University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2005
资助国家：
美国
起止时间：
2005-11-15 至 2009-10-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0535112&HistoricalAwards=false
关键词：
Eye Gaze Salience Modeling Robust

项目摘要

In spoken dialog systems, interpreting user speech input is still a significant challenge due to limited speech recognition and language understanding performance. This problem is further amplified if a user has an accent or is speaking in a noisy environment. However, previous research has shown that, in multimodal systems, fusing two or more information sources can be an effective means of reducing recognition uncertainties, for example through mutual disambiguation. Inspired by earlier work on multimodal systems, in this project the PI will investigate the role of eye gaze in human machine conversation, in particular in salience modeling for robust spoken language understanding. Cognitive studies have shown that human eye gaze is one of the reliable indicators of what a person is "thinking about." Specifically, eye gaze is tightly linked to human language processing. Previous psycholinguistic work has shown that almost immediately after hearing a word, the eyes move to the corresponding real-world referent. And right before speaking a word, the eyes also move to the mentioned object. Not only is eye gaze highly reliable, it is also an implicit, subconscious reflex of speech. The user does not need to make a conscious decision; the eye automatically moves towards the relevant object, without the user even being aware. Motivated by these psycholinguistic findings, the PI's hypothesis is that during human machine conversation user eye gaze information coupled with conversation context can signal a part of the physical world (related to the domain and the graphical interface) that is most salient at each point of communication, thus it can potentially be used to tailor the interpretation of speech input. Based on this hypothesis, the PI will seek to improve spoken language understanding in conversational interfaces through a new salience-based framework with two objectives: (1) To better understand the role of eye gaze in human language production and its implications in salience modeling for automated input interpretation; and (2) To develop algorithms and systems that apply computational gaze based salience modeling to robust spoken language understanding. These objectives will be pursued in the following four directions: (a) Investigation of the utility of human eye gaze and its implications for salience modeling during human machine conversation through psycholinguistic studies; (b) Development of computational salience models that integrate eye gaze with conversation context to automatically identify a salient part of the physical world at each point of communication; (c) Development of approaches that apply the new salience models to constrain the hypothesis space for robust spoken language understanding; and (d) Evaluation of the generality of the new approaches in two different applications: an interior design/training application based on a 3D rendered interface, and an information seeking application using a 2D map-based interface.Broader Impacts: The technologies to be developed in this interdisciplinary project can be applied to many applications such as virtual training systems where users can see the interface and talk to the computer system at the same time. The technologies will benefit a variety of diverse users, and particularly individuals who are unable to interact with graphical interfaces with their hands (e.g., motion disabled users). Since one major application area of the work is e-training and e-learning, the education and outreach impact of the proposed research is potentially profound; the PI will make specific efforts to transfer the research results into classrooms. The project will also provide a unique opportunity for students in Computer Science, Psychology, and Cognitive Science to work together, and thus will synergize multidisciplinary research activities at Michigan State University.

在口语对话系统中，由于有限的语音识别和语言理解性能，解释用户语音输入仍然是一个重大挑战。如果用户有口音或在嘈杂的环境中说话，则该问题进一步放大。然而，先前的研究表明，在多模态系统中，融合两个或更多个信息源可以是减少识别不确定性的有效手段，例如通过相互消除歧义。受多模态系统早期工作的启发，在这个项目中，PI将研究眼睛注视在人机对话中的作用，特别是在鲁棒口语理解的显着性建模中。认知研究表明，人类的眼睛注视是一个人“在想什么”的可靠指标之一。“具体来说，眼睛注视与人类语言处理密切相关。先前的心理语言学研究表明，在听到一个单词后，眼睛几乎立即会移动到相应的现实世界所指。就在说话之前，眼睛也会移动到提到的物体上。目光不仅是高度可靠的，它也是一种隐含的、潜意识的言语反射。用户不需要做出有意识的决定;眼睛会自动移向相关对象，而用户甚至没有意识到。受这些心理语言学研究结果的启发，PI的假设是，在人机对话过程中，用户的眼睛注视信息加上对话上下文可以表示物理世界的一部分（与域和图形界面相关），这在每个通信点上都是最突出的，因此它可以潜在地用于定制语音输入的解释。基于这一假设，PI将寻求通过一个新的基于显着性的框架来提高会话界面中的口语理解，该框架有两个目标：（1）更好地理解眼睛注视在人类语言产生中的作用及其在自动输入解释显着性建模中的意义;以及（2）开发将基于计算注视的显著性建模应用于鲁棒口语理解的算法和系统。这些目标将在以下四个方向上实现：（a）通过心理语言学研究，调查人眼注视的效用及其对人机对话期间突显性建模的影响;（B）开发计算突显性模型，将眼睛注视与对话背景相结合，以自动识别每个通信点的物理世界的显著部分;（c）开发应用新的显著性模型来约束假设空间的方法，以实现强大的口语理解;以及（d）评价新方法在两种不同应用中的普遍性：基于3D渲染界面的室内设计/培训应用程序，以及使用基于2D地图界面的信息搜索应用程序。在这个跨学科项目中开发的技术可以应用于许多应用，例如虚拟培训系统，用户可以同时看到界面并与计算机系统交谈。这些技术将使各种各样的用户受益，特别是那些无法用手与图形界面交互的人（例如，运动残疾用户）。由于工作的一个主要应用领域是电子培训和电子学习，拟议研究的教育和推广影响可能是深远的; PI将作出具体努力，将研究成果转移到教室。该项目还将为计算机科学，心理学和认知科学的学生提供一个独特的机会，共同努力，从而将协同密歇根州立大学的多学科研究活动。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Joyce Chai其他文献

Improving Coherence of Language Model Generation with Latent Semantic State

提高语言模型生成与潜在语义状态的一致性

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
Amanda Askell;Yuntao Bai;Anna Chen;Dawn Drain;Deep Ganguli;T. Henighan;Andy Jones;Benjamin Mann;Nova Dassarma;Nelson El;Zac Hatfield;Danny Hernandez;John Kernion;Kamal Ndousse;Catherine Olsson;Dario Amodei;Tom Brown;J. Clark;Sam Mc;Chris Olah;Jared Kaplan;Nick Ryder;Jared D Subbiah;Prafulla Kaplan;A. Dhariwal;P. Neelakantan;Girish Shyam;Amanda Sastry;Sandhini Askell;Ariel Agarwal;Herbert;Gretchen Krueger;R. Child;Aditya Ramesh;Daniel M. Ziegler;Jeffrey Wu;Christopher Winter;Mark Hesse;Eric Chen;Mateusz Sigler;Scott teusz Litwin;Benjamin Gray;Jack Chess;Christopher Clark;Sam Berner;Alec McCandlish;Ilya Radford;Sutskever Dario;Amodei;Joshua Maynez;Shashi Narayan;Bernd Bohnet;Kurt Shuster;Spencer Poff;Moya Chen;Douwe Kiela;Shane Storks;Qiaozi Gao;Yichi Zhang;Joyce Chai;Niket Tandon;Keisuke Sakaguchi;Bhavana Dalvi;Dheeraj Rajagopal;Peter Clark;Michal Guerquin;Kyle Richardson;Eduard H. Hovy;A. Dataset;Rowan Zellers;Ari Holtzman;Matthew E. Peters;Roozbeh Mottaghi;Aniruddha Kembhavi;Ali Farhadi;Chunting Zhou;Graham Neubig;Jiatao Gu;Mona Diab;Francisco Guzmán;Luke Zettlemoyer
通讯作者：
Luke Zettlemoyer