ITR: Mining Text for General World Knowledge
ITR:挖掘文本以获取一般世界知识
基本信息
- 批准号:0082928
- 负责人:
- 金额:$ 44.97万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2000
- 资助国家:美国
- 起止时间:2000-09-01 至 2003-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Despite significant advances in recent years in speech recognition generation technology and statistical language modeling, existing natural language systems are still limited to very specific, narrow domains, and totally lack common sense - the ability to "see the obvious" when interacting with a user. A major reason for this is the lack of a broad base of general world knowledge in current AI systems - knowledge such as that a sandwich is food (for. humans), while dinnerware is not; that dwellings usually have doors and walls; or, that when one person is killed by another, it is often with a gun; etc. This project will use previous work on mining linguistic knowledge from text as a springboard for tackling the problem of mining general world knowledge from texts. The methodology depends neither on "deep" text understanding nor on explicit occurrence of the desired general facts in the targeted corpora. Rather, the PI's approach elaborates on the idea that regularities observed in patterns of predication in texts generally reflect regularities in the world, particularly regularities in the way certain types of entities jointly participate in various events and relationships. While absolute statistical frequencies of such patterns can be severely misleading (people do not commit crimes, or have accidents or hold public office nearly as often as scanning of newspapers might suggest), the techniques that will be employed rely on conditional frequencies to obtain factually reliable hypotheses. The knowledge extracted will be cast in a formally interpretable propositional form, lending itself to certain and uncertain inference. This in turn will help "sanitize" the extracted knowledge, by revealing and helping to remedy apparent contradictions. Suitable corpora for this work include not only newspapers and other factual sources, but also realistic novels and writings for children - in fact, almost all electronically accessible texts are potentially useful, and no annotation will be required. While not all kinds of common-sense knowledge can be acquired in this way, the knowledge that can be acquired is very extensive, is essential to language understanding and common-sense reasoning, and is relatively close at hand. The kind of general knowledge to be mined from text corpora is not only useful, but essential in the long run for intelligent systems with some general linguistic competence and a modicum of common sense. Thus the work will bring a step closer the prospect of computers that genuinely understand their users.
尽管近年来在语音识别生成技术和统计语言建模方面取得了重大进展,但现有的自然语言系统仍然局限于非常具体,狭窄的领域,并且完全缺乏常识-与用户交互时“看到明显”的能力。一个主要原因是目前的人工智能系统缺乏广泛的一般世界知识基础-例如三明治是食物的知识。这个项目将利用以前关于从文本中挖掘语言知识的工作作为跳板,解决从文本中挖掘一般世界知识的问题。该方法既不依赖于“深度”的文本理解,也不依赖于目标语料库中所需的一般事实的明确出现。相反,PI的方法详细阐述了这样一种观点,即在文本中的谓词模式中观察到的谓词通常反映了世界上的谓词,特别是某些类型的实体共同参与各种事件和关系的方式。虽然这种模式的绝对统计频率可能会严重误导(人们不犯罪,或发生事故或担任公职的频率几乎与扫描报纸可能显示的频率一样),但将采用的技术依赖于条件频率来获得事实上可靠的假设。所提取的知识将被转换成一种形式上可解释的命题形式,从而有助于进行确定性和不确定性的推理。这反过来将有助于“净化”提取的知识,通过揭示和帮助纠正明显的矛盾。适合这项工作的语料库不仅包括报纸和其他事实来源,还包括现实主义小说和儿童作品-事实上,几乎所有电子可访问的文本都是潜在有用的,不需要注释。虽然不是所有种类的常识性知识都可以通过这种方式获得,但可以获得的知识非常广泛,对于语言理解和常识性推理至关重要,并且相对就近。从文本语料库中挖掘的一般知识不仅有用,而且从长远来看,对于具有一些一般语言能力和少量常识的智能系统来说是必不可少的。因此,这项工作将使真正理解用户的计算机的前景更近一步。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Lenhart Schubert其他文献
SOPHIE: Testing a Virtual, Interactive, AI-Augmented End-of-Life Communication Training Tool (RP122)
索菲:测试一款虚拟交互式、人工智能增强的临终沟通训练工具(RP122)
- DOI:
10.1016/j.jpainsymman.2024.02.469 - 发表时间:
2024-05-01 - 期刊:
- 影响因子:3.500
- 作者:
Kurtis G. Haut;Ronald Epstein;Thomas M. Carroll;Benjamin Kane;Lenhart Schubert;Ehsan Hoque - 通讯作者:
Ehsan Hoque
Monotonic Inference with Unscoped Episodic Logical Forms: From Principles to System
具有无范围情景逻辑形式的单调推理:从原理到系统
- DOI:
10.1007/s10849-023-09412-2 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
G. Kim;Mandar Juvekar;Junis Ekmekciu;Viet;Lenhart Schubert - 通讯作者:
Lenhart Schubert
Lenhart Schubert的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Lenhart Schubert', 18)}}的其他基金
EAGER: Learning a High-Fidelity Semantic Parser
EAGER:学习高保真语义解析器
- 批准号:
1940981 - 财政年份:2019
- 资助金额:
$ 44.97万 - 项目类别:
Standard Grant
RI: Small: Adapting a Natural Logic Reasoning Platform to the Task of Entailment Inference
RI:小:使自然逻辑推理平台适应蕴涵推理任务
- 批准号:
1016735 - 财政年份:2010
- 资助金额:
$ 44.97万 - 项目类别:
Standard Grant
RI: Small: General Knowledge Bootstrapping from Text
RI:小:从文本引导常识知识
- 批准号:
0916599 - 财政年份:2009
- 资助金额:
$ 44.97万 - 项目类别:
Continuing Grant
IIS: Knowledge Representation and Reasoning Mechanisms for Explicitly Self-Aware Communicative Agents
IIS:显式自我意识交流代理的知识表示和推理机制
- 批准号:
0535105 - 财政年份:2006
- 资助金额:
$ 44.97万 - 项目类别:
Standard Grant
Deriving General World Knowledge from Texts by Abstraction of Logical Forms
通过抽象逻辑形式从文本中导出一般世界知识
- 批准号:
0328849 - 财政年份:2003
- 资助金额:
$ 44.97万 - 项目类别:
Standard Grant
Robust, Incremental Parsing and Disambiguation for a Dialog Agent
对话代理的稳健、增量解析和消歧
- 批准号:
9503312 - 财政年份:1995
- 资助金额:
$ 44.97万 - 项目类别:
Continuing Grant
The Representation of Unreliable General Knowledge for Narrative Understanding
叙事理解中不可靠的一般知识的表示
- 批准号:
9013160 - 财政年份:1991
- 资助金额:
$ 44.97万 - 项目类别:
Continuing Grant
相似国自然基金
基于Genome mining技术研究抑制表皮葡萄球菌生物膜形成的次级代谢产物
- 批准号:21242003
- 批准年份:2012
- 资助金额:10.0 万元
- 项目类别:专项基金项目
相似海外基金
Development of social attention indicators of emerging technologies and science policies with network analysis and text mining
利用网络分析和文本挖掘开发新兴技术和科学政策的社会关注指标
- 批准号:
24K16438 - 财政年份:2024
- 资助金额:
$ 44.97万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
CAREER: Mining Hints from Text Documents to Guide Automated Database Performance Tuning
职业:从文本文档中挖掘提示来指导自动数据库性能调优
- 批准号:
2239326 - 财政年份:2023
- 资助金额:
$ 44.97万 - 项目类别:
Continuing Grant
Research on Gender Differences in Entrepreneurship Using Text Mining
基于文本挖掘的创业性别差异研究
- 批准号:
23K01607 - 财政年份:2023
- 资助金额:
$ 44.97万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Semantic Representations for Interactive Text Mining
交互式文本挖掘的语义表示
- 批准号:
RGPIN-2020-04834 - 财政年份:2022
- 资助金额:
$ 44.97万 - 项目类别:
Discovery Grants Program - Individual
Next generation Text Mining in Drug Discovery
药物发现中的下一代文本挖掘
- 批准号:
BB/X511833/1 - 财政年份:2022
- 资助金额:
$ 44.97万 - 项目类别:
Training Grant
Next generation Text Mining in Drug Discovery
药物发现中的下一代文本挖掘
- 批准号:
2760490 - 财政年份:2022
- 资助金额:
$ 44.97万 - 项目类别:
Studentship
Harmonizing String and Unification-based Methodology with Machine Learning for Text Mining and Processing
将基于字符串和统一的方法与用于文本挖掘和处理的机器学习相协调
- 批准号:
RGPIN-2019-05683 - 财政年份:2022
- 资助金额:
$ 44.97万 - 项目类别:
Discovery Grants Program - Individual
Semantic Representations for Interactive Text Mining
交互式文本挖掘的语义表示
- 批准号:
RGPIN-2020-04834 - 财政年份:2021
- 资助金额:
$ 44.97万 - 项目类别:
Discovery Grants Program - Individual
Development of SaaS Software for Talent Acquisition Using Natural Language Processing and Text Mining Algorithms
使用自然语言处理和文本挖掘算法开发人才招聘 SaaS 软件
- 批准号:
566995-2021 - 财政年份:2021
- 资助金额:
$ 44.97万 - 项目类别:
Applied Research and Development Grants - Level 1
Tools and methods for mining text data for electricity markets
用于挖掘电力市场文本数据的工具和方法
- 批准号:
564082-2021 - 财政年份:2021
- 资助金额:
$ 44.97万 - 项目类别:
University Undergraduate Student Research Awards