权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

A study on multi-modal man-machine interface through spontaneous speech

基于自发语音的多模态人机界面研究

基本信息

批准号：
06452401
负责人：
NAKAGAWA Seiichi
金额：
$ 3.39万
依托单位：
Toyahashi University of Thechnology
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (B)
财政年份：
1994
资助国家：
日本
起止时间：
1994 至 1996
项目状态：
已结题

项目摘要

We developed a malti-modal dialogue system that is composed of 4 parts : input by speech recognizer and touch screen, graphical user interface, natural language interpreter, and response generater.Our speech recognizer intergrates the acoustic process with linguistic process directly without the phrase or word lattice. Furthermore, the recognizer processes interjections and restarts based on an unknown word processing technique.A context free grammar is made to be able to accept sentences with omitted post-positions and inversion of word in order to recognize spontaneous speech.Although our spontaneous speech recognizer outputs some errors caused by misrecognition (substitution errors), out of vacabulary (unknown words) and out of grammar (illegal utterances), the language interpreter can understand the meaning of errorful/illegal utterances.The input by touch screen is used to designate the location of map on the display or to select the desired item form the menu which consists of the set of items responded by a speech synthesizer. We use boht the display output (map and menu) and speech systhesis for the response. User can use the postioning / selecting input and speech input at the same time. On man-machine communication, user wants to know his or machine situation what information he gets from the dialogue or how machine interprets / understands his utterances, as well as the speech recognition result. Therefore our system displays the history of dialogue. This function helps to eliminate the user uneasiness. Experimental evalution showed that our interpretation mechanism was suitable for understanding the recognition result of spontaneous speech. And we found that the multi-modal interface with spontaneous speech and touch screen was user-friendly.

我们开发了一个多模态对话系统，它由四个部分组成：语音识别器和触摸屏输入，图形用户界面，自然语言解释器和响应发生器。本文提出了一种上下文无关的语法，使其能够接受省略了后置词和倒装词的句子，从而实现了对自发语音的识别（替换错误），超出真空（unknown words）and out of grammar（不认识的词）（非法言论），语言解释器可以理解错误/通过触摸屏的输入用于指定地图在显示器上的位置或从由语音合成器响应的项目集合组成的菜单中选择期望的项目。我们使用显示输出（地图和菜单）和语音合成作为响应。用户可以同时使用定位/选择输入和语音输入。在人机对话中，用户希望了解自己或机器的情况，从对话中得到什么信息，机器如何解释/理解他的话语，以及语音识别的结果。因此，我们的系统显示对话的历史。此功能有助于消除用户的不安。实验结果表明，我们的解释机制是适合理解自发语音的识别结果。我们发现，多模态界面与自发语音和触摸屏是用户友好的。