权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Study on a method of recovery from user's error in a multimodal information environment

多模态信息环境下用户错误修复方法研究

基本信息

批准号：
13680407
负责人：
IMAMIYA Atsumi
金额：
$ 2.3万
依托单位：
University of Yamanashi
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (C)
财政年份：
2001
资助国家：
日本
起止时间：
2001 至 2003
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-13680407/
关键词：
Multimodal Interface Recovery from User' Error Gaze Visual Retrieval Eyesight Input Speech Undo 履歴複合現実空間視覚的探索遠隔カメラ制御両手操作

项目摘要

Multimodal system has the potential to greatly improve the flexibility, robustness, efficiency, universal accessibility and naturalness of human-machine interaction. This study investigated two multimodal techniques related with the integration of speech and eyesight, because humans naturally use these two modalities to communicate with each other.The first study was about the gaze and mouse multimodal user interface. The eyesight naturally indicates one's attentions and interests, and the eye movement is rapid, so the eye gaze information can provide a quick, natural and convenient input method. In order to improve the accuracy of the gaze input method, a gaze and mouse multimodal complementary method was proposed. In this method, gaze modality was used to improve speed by selecting directly or shortening a moving distance of mouse, and the mouse was used to improve accuracy when the gaze fixation was far away from a target.The second study was about gaze and speech multimodal input m … More ethodologies. We use these two modalities naturally and simultaneously in our daily life especially when determining deictic referents in a spoken dialogue. However, the recognition ambiguities of speech and gaze inputs are inevitable. Since both gaze and speech were error-prone modalities as a stand-alone, the goal of this study was to build an effective and robust human computer interaction system through these modalities.The features of the speech and gaze multimodal system are as follows:・The multimodal architecture can support the mutual correction of recognition errors from component modalities. Speech recognition errors can be corrected by gaze, and vice versa. Even if both gaze and speech recognition errors occur, the correct multimodal result can be obtained.・Ambiguities in the speech signal can be resolved by gaze information. The multimodal architecture eliminates the need for the lengthy definite descriptions that would be necessary for unnamed objects if only speech is used. Thus, gaze information significantly contributes to simplifying the user's speaking. Simplified speech causes less・recognition errors, and facilitates both error avoidance and user's acceptance, as well as provides a natural and intuitive way to interact with the computer.・The simplified speech contributes to improving interaction speed, and provides users with an efficient multimodal interface. Less

多模态系统具有极大地提高人机交互的灵活性、鲁棒性、效率、普适性和自然性的潜力。本研究探讨了两种与语音和视觉整合相关的多模态技术，因为人类自然地使用这两种模态来相互交流。由于眼睛自然地表示一个人的注意力和兴趣，并且眼球运动迅速，因此眼睛注视信息可以提供一种快速、自然和方便的输入方法。为了提高视线输入法的准确性，提出了一种视线与鼠标多模态互补的方法。在该方法中，通过直接选择或缩短鼠标的移动距离来提高速度，并在注视点远离目标时使用鼠标来提高准确率。第二个研究是关于注视和语音多模态输入的研究，它是一种基于多模态输入的语音识别方法。 ...更多信息行为学在日常生活中，特别是在确定口语对话中的指示所指时，我们自然地同时使用这两种情态。然而，语音和凝视输入的识别歧义是不可避免的。由于凝视和语音都是独立的容易出错的模态，因此本研究的目标是通过这些模态来构建一个有效的和鲁棒的人机交互系统。语音和凝视多模态系统的特征如下：·多模态体系结构可以支持来自组件模态的识别错误的相互校正。语音识别错误可以通过凝视来纠正，反之亦然。即使出现凝视和语音识别错误，也可以获得正确的多模态结果。·语音信号中的歧义可以通过凝视信息来解决。多模态架构消除了对冗长的明确描述的需要，如果只使用语音，那么对于未命名的对象来说，这是必要的。因此，注视信息显著有助于简化用户的说话。简化的语音会导致更少的识别错误，并有助于避免错误和用户的接受，以及提供一种自然和直观的方式与计算机进行交互。简化的语音有助于提高交互速度，并为用户提供高效的多模态界面。少