权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Exploiting Speech Understanding in Intelligent Interfaces

在智能界面中利用语音理解

基本信息

批准号：
06044055
负责人：
WARD Nigel
金额：
$ 2.69万
依托单位：
The University of TOKYO
依托单位国家：
日本
项目类别：
Grant-in-Aid for international Scientific Research
财政年份：
1994
资助国家：
日本
起止时间：
1994 至 1995
项目状态：
已结题

项目摘要

We are interested in the use of spoken language in human-computer interaction. The inspiration is the fact that, for human-human interaction, meaningful exchanges can take place even without accurate recognition of the words the other is saying --- this being possible due to shared knowledge and complementary communication channels, especially gesture and prosody. We want to exploit this fact for man-machine interfaces.Therefore we are doing three things :1. Using simple speech recognition to augment graphical user interfaces, well integrated with other input modalities : keyboard, mouse, and touch screen.2. Building systems able to engage in simple conversations, using mostly prosodic clues. To sketch out our latest success :We conjectured that it would be possible for Japanese to decide when to produce many back-channel utterances based on prosodic clues alone, without reference to meaning.We found thatneither vowel lengthening, volume changes, nor energy level (to detect when the other finished speaking) were by themselves good predictors of when to produce an aizuchi. The best predictor was a low pitch level.Specifically, upon detection of the end of a region of pitch less than.9 times the local median pitch and continuing for 150ms, coming after at least 600ms of speech, the system predicted an aizuchi 200ms to 300ms later, providing it had not done so within the preceding 1 second.We also built a real-time system based on the above decision rule. A human stooge steered the conversation to a suitable topic and then switched on the system. After swich-on the stooge's utterances and the system's outputs, mixed together, produced one side of the conversation. We found that none of the 5 subjects had realized that his conversation partner had become partially automated.3. Building tools and collecting data to help do 1 and 2.

我们感兴趣的是在人机交互中使用口语。灵感来自于这样一个事实，即对于人与人的互动，即使没有准确识别对方所说的话，也可以进行有意义的交流-这是可能的，因为共享知识和互补的沟通渠道，特别是手势和韵律。我们想在人机界面中利用这一点，因此我们做了三件事：1.使用简单的语音识别来增强图形用户界面，与其他输入方式（键盘、鼠标和触摸屏）很好地集成。2.构建能够进行简单对话的系统，主要使用韵律线索。简单介绍一下我们最新的成功：我们证实，日本人可以仅仅根据韵律线索来决定何时发出许多非正式的话语，而不考虑其意义。我们发现，无论是元音的拉长、音量的变化，还是能量水平（用来检测对方何时结束讲话）本身都不能很好地预测何时发出合音。最好的预测器是一个低的音高水平。具体地说，在检测到音高小于0.9倍的局部中值音高的区域的结束并持续150 ms时，至少在600 ms的语音之后，系统预测200 ms到300 ms后的aizuchi，前提是它在前1秒内没有这样做。我们还建立了一个基于上述决策规则的实时系统。一个人类傀儡将谈话引导到一个合适的话题，然后打开系统。在接通傀儡的话语和系统的输出后，混合在一起，产生了对话的一方。我们发现，没有一个被试意识到他的谈话伙伴已经成为部分自动化。建立工具和收集数据来帮助完成1和2。