权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

ITR: Mining Text for General World Knowledge

ITR：挖掘文本以获取一般世界知识

基本信息

批准号：
0082928
负责人：
Lenhart Schubert
金额：
$ 44.97万
依托单位：
University of Rochester
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2000
资助国家：
美国
起止时间：
2000-09-01 至 2003-12-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0082928&HistoricalAwards=false
关键词：
ITR Mining Text General World

项目摘要

Despite significant advances in recent years in speech recognition generation technology and statistical language modeling, existing natural language systems are still limited to very specific, narrow domains, and totally lack common sense - the ability to "see the obvious" when interacting with a user. A major reason for this is the lack of a broad base of general world knowledge in current AI systems - knowledge such as that a sandwich is food (for. humans), while dinnerware is not; that dwellings usually have doors and walls; or, that when one person is killed by another, it is often with a gun; etc. This project will use previous work on mining linguistic knowledge from text as a springboard for tackling the problem of mining general world knowledge from texts. The methodology depends neither on "deep" text understanding nor on explicit occurrence of the desired general facts in the targeted corpora. Rather, the PI's approach elaborates on the idea that regularities observed in patterns of predication in texts generally reflect regularities in the world, particularly regularities in the way certain types of entities jointly participate in various events and relationships. While absolute statistical frequencies of such patterns can be severely misleading (people do not commit crimes, or have accidents or hold public office nearly as often as scanning of newspapers might suggest), the techniques that will be employed rely on conditional frequencies to obtain factually reliable hypotheses. The knowledge extracted will be cast in a formally interpretable propositional form, lending itself to certain and uncertain inference. This in turn will help "sanitize" the extracted knowledge, by revealing and helping to remedy apparent contradictions. Suitable corpora for this work include not only newspapers and other factual sources, but also realistic novels and writings for children - in fact, almost all electronically accessible texts are potentially useful, and no annotation will be required. While not all kinds of common-sense knowledge can be acquired in this way, the knowledge that can be acquired is very extensive, is essential to language understanding and common-sense reasoning, and is relatively close at hand. The kind of general knowledge to be mined from text corpora is not only useful, but essential in the long run for intelligent systems with some general linguistic competence and a modicum of common sense. Thus the work will bring a step closer the prospect of computers that genuinely understand their users.

尽管近年来在语音识别生成技术和统计语言建模方面取得了重大进展，但现有的自然语言系统仍然局限于非常具体，狭窄的领域，并且完全缺乏常识-与用户交互时“看到明显”的能力。一个主要原因是目前的人工智能系统缺乏广泛的一般世界知识基础-例如三明治是食物的知识。这个项目将利用以前关于从文本中挖掘语言知识的工作作为跳板，解决从文本中挖掘一般世界知识的问题。该方法既不依赖于“深度”的文本理解，也不依赖于目标语料库中所需的一般事实的明确出现。相反，PI的方法详细阐述了这样一种观点，即在文本中的谓词模式中观察到的谓词通常反映了世界上的谓词，特别是某些类型的实体共同参与各种事件和关系的方式。虽然这种模式的绝对统计频率可能会严重误导（人们不犯罪，或发生事故或担任公职的频率几乎与扫描报纸可能显示的频率一样），但将采用的技术依赖于条件频率来获得事实上可靠的假设。所提取的知识将被转换成一种形式上可解释的命题形式，从而有助于进行确定性和不确定性的推理。这反过来将有助于“净化”提取的知识，通过揭示和帮助纠正明显的矛盾。适合这项工作的语料库不仅包括报纸和其他事实来源，还包括现实主义小说和儿童作品-事实上，几乎所有电子可访问的文本都是潜在有用的，不需要注释。虽然不是所有种类的常识性知识都可以通过这种方式获得，但可以获得的知识非常广泛，对于语言理解和常识性推理至关重要，并且相对就近。从文本语料库中挖掘的一般知识不仅有用，而且从长远来看，对于具有一些一般语言能力和少量常识的智能系统来说是必不可少的。因此，这项工作将使真正理解用户的计算机的前景更近一步。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Lenhart Schubert其他文献

SOPHIE: Testing a Virtual, Interactive, AI-Augmented End-of-Life Communication Training Tool (RP122)

索菲：测试一款虚拟交互式、人工智能增强的临终沟通训练工具（RP122）

DOI：
10.1016/j.jpainsymman.2024.02.469
发表时间：
2024-05-01
期刊：
JOURNAL OF PAIN AND SYMPTOM MANAGEMENT
影响因子：
3.500
作者：
Kurtis G. Haut;Ronald Epstein;Thomas M. Carroll;Benjamin Kane;Lenhart Schubert;Ehsan Hoque
通讯作者：
Ehsan Hoque