权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CompCog: Deep causal inference grounds the perception of cognitive objects in speech

CompCog：深层因果推理为语音中认知对象的感知奠定了基础

基本信息

批准号：
2240349
负责人：
Khalil Iskarous
金额：
$ 60万
依托单位：
University of Southern California
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-08-15 至 2026-07-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2240349&HistoricalAwards=false
关键词：
CompCog Deep causal inference grounds

项目摘要

Artificial Intelligence systems are becoming more and more important in society, and their performance has improved enormously in recent years. Yet, we still do not understand how these systems actually work, and how they emulate human performance, to the extent that they do. In this work, novel methods are developed for probing the inner working of these systems by comparing their internal computations with corresponding computations humans perform. The specific skill we probe is speech recognition—a highly complex process, as speech is a richly variable, information dense, and quickly transmitted medium of communication. One of the ways that the human brain deals with this complexity during speech recognition is by engaging not only brain areas responsible for listening but also areas crucial to the production of speech. This suggests that human cognition is aware of the systems in the world that cause speech—the movements of the lips, tongue, and other vocal organs. Do current artificial intelligence systems also develop such a deep causal understanding of speech? In this work we answer this question by delving into the mathematical models of both human and machine knowledge in these systems. Our work has two major goals. The first is technological: understanding how artificial intelligence systems actually work on the inside, which is ultimately a necessary step in directing their abilities to societal benefit. The second is scientific: before widespread use in society, artificial intelligence systems were developed by cognitive scientists to understand human cognition, and by probing the inner workings of state-of-the-art machine learning as a cognitive model we may be able to better understand how humans perceive speech. In this work, therefore, science and technology further each other, as they have done successfully in the past.This research program specifically probes the relationship between the production and perception of speech in humans and computers. To do so, speakers of three languages (English, Russian, Korean) are imaged using a real time Magnetic Resonance Imaging (MRI), which shows in vivid detail how the speech articulators move. Speaker’s speech audio signals are recorded simultaneously. The data are analyzed using mathematical models of speech production, modern speech recognition systems, and mathematical models of how human neural rhythms analyze speech. Experimental manipulations unveil how the representations in each of the systems corresponds to those in the others. This strategy inform us about the science of human cognition hand-in-hand with illuminating the black-box technology of machine emulation of the human capability. In the future, in addition to advancing science and technology, we anticipate the application of this knowledge to the creation of novel small-sized speech recognition systems that can assist in the documentation of endangered languages.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

人工智能系统在社会上变得越来越重要，近年来它们的性能有了很大的提高。然而，我们仍然不了解这些系统实际是如何工作的，以及它们如何在一定程度上模仿人类的表现。在这项工作中，通过将这些系统的内部计算与人类执行的相应计算进行比较，开发了新的方法来探索这些系统的内部工作。我们探索的具体技能是语音识别--这是一个非常复杂的过程，因为语音是一种变量丰富、信息密集、传输速度快的交流媒介。人类大脑在语音识别过程中处理这种复杂性的方法之一是，不仅让负责倾听的大脑区域参与进来，还让对语音产生至关重要的区域参与进来。这表明，人类的认知意识到了世界上导致说话的系统--嘴唇、舌头和其他发声器官的运动。目前的人工智能系统也开发出了对语音的如此深入的因果理解吗？在这项工作中，我们通过深入研究这些系统中人和机器知识的数学模型来回答这个问题。我们的工作有两个主要目标。第一个是技术：了解人工智能系统实际上是如何在内部工作的，这最终是将其能力引导到社会利益中的必要步骤。第二个是科学的：在广泛应用于社会之前，人工智能系统是由认知科学家开发的，目的是理解人类的认知，通过探索最先进的机器学习的内部工作原理作为一种认知模型，我们可能能够更好地理解人类如何感知言语。因此，在这项工作中，科学和技术相互促进，就像它们在过去所做的那样。这个研究项目专门探索人类和计算机中语音的产生和感知之间的关系。为了做到这一点，使用三种语言(英语、俄语和韩语)的人使用实时磁共振成像(MRI)进行成像，该成像生动地显示了语音发音器是如何运动的。同时记录说话人的语音音频信号。这些数据使用语音产生的数学模型、现代语音识别系统以及人类神经节律如何分析语音的数学模型进行分析。实验操作揭示了每个系统中的表示如何与其他系统中的表示相对应。这一策略让我们了解了人类认知科学，同时也照亮了机器模拟人类能力的黑匣子技术。在未来，除了推动科学和技术的发展，我们预计将这些知识应用于创建新的小型语音识别系统，以帮助记录濒危语言。这一奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。