CISE Research Resources: Discourse Penn Treebank and Multimodal FORM: Development of Two Richly Annotated Corpora

CISE 研究资源:Discourse Penn Treebank 和 Multimodal FORM:两个注释丰富的语料库的开发

基本信息

  • 批准号:
    0224417
  • 负责人:
  • 金额:
    $ 99.78万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2002
  • 资助国家:
    美国
  • 起止时间:
    2002-10-15 至 2006-09-30
  • 项目状态:
    已结题

项目摘要

EIA-0224417Aravind K. JoshiMark LibermanUniversity of PennsylvaniaCISE RR: Discourse Penn Trebank and Multimodal FORM: Development of Two Richly Annotated CorporaThis project, providing critical resources for research discourse modeling and conversational interaction, aims at developing new technologies and systems for information retrieval and human computer interaction. Centering on the construction of annotated corpora, two large-scale resources, one in the discourse domain and one in the dialog domain will be built:1. Discourse Penn Treebank (DPTB) and2. MultiFORM: Augmenting the FORM corpus with body movements, speech, and intonation.The former project develops a large scale and reliably annotated corpus that will encode coherence relations associated with discourse connectives, including their argument structure and anaphoric links, thus exposing a clearly defined level of discourse structure and supporting the extraction of a range of inferences associated with discourse connectives. This annotation will be "on top of" the Penn Treebank (PTB) annotations as well as the predicate-argument annotations of PTB (called the Proposition Bank or Prop Bank). The latter involves a corpus of gesture-annotated videos, FORM that was designed to be extensible in order to eventually represent the entire multimodal experience of conversational interaction. This multimodal FORM , MultiFORM, will be created by adding body movement, speech and syntactic structure, and intonation. Large-scale annotated corpora have played a critical role in speech and natural language research by enabling large-scale integration of statistical knowledge (derived from the corpora) with linguistic knowledge (as represented in annotations) leading to scientific and technological advances. Representative examples constitute robust parsing and automatic extraction of relations and coreferences and their applications to information extraction, question answering, summarization, and machine translation. PTB, a resource developed a decade ago, represents an example of such a resource that impacts natural language processing worldwide. PTB deals with corpora at the sentence level warranting a new large scale and reliable discourse and dialog structure annotated corpora. Although intellectual and practical connections exist between studies of the structures of discourse and dialog, the initial requirements for resources to study these areas diverge while overlapping in conception. On the discourse side, we need for corpora that deals with the kinds of structures found in composed text such as journalistic articles. The dialog side needs to focus on interactions among people and on extemporized rather than pre-composed material.
EIA-0224417阿拉文德·K·乔希马克·利伯曼宾夕法尼亚大学CIE RR:话语宾夕法尼亚大学Trebank and Multimodal Form:开发两个注释丰富的语料库这个项目为研究话语建模和对话交互提供关键资源,旨在开发用于信息检索和人机交互的新技术和系统。围绕着标注语料库的建设,将建立两个大规模的资源,一个在语篇领域,一个在对话领域:1.语篇宾夕法尼亚树库(DPTB)和2.语篇佩恩树库(DPTB)。多形式:用身体动作、语音和语调来扩充形式语料库。前一个项目开发了一个大规模的、带可靠注释的语料库,它将编码与语篇连接词相关的连贯关系,包括它们的论元结构和回指链接,从而揭示出明确定义的语篇结构水平,并支持提取与语篇连接词相关的一系列推理。该批注将位于宾夕法尼亚树库(PTB)批注以及PTB的谓词参数批注(称为命题库或命题库)之上。后者涉及一个手势注释的视频语料库,这种形式被设计为可扩展的,以便最终代表整个对话交互的多模式体验。这种多模式的形式,多形式,将通过添加身体运动,语音和句法结构,以及语调来创建。大规模标注语料库在言语和自然语言研究中发挥了关键作用,使统计知识(源自语料库)与语言学知识(如标注所示)大规模结合起来,从而推动了科学和技术进步。典型的例子构成了关系和共指关系的健壮的分析和自动提取,以及它们在信息提取、问题回答、摘要和机器翻译中的应用。PTB是十年前开发的一种资源,它是影响全球自然语言处理的这种资源的一个例子。PTB在句子层面处理语料库,需要一个新的大规模、可靠的语篇和对话结构标注语料库。尽管对话语结构和对话结构的研究存在智力和实践上的联系,但在概念重叠的情况下,研究这些领域的资源的最初要求是不同的。在语篇方面,我们需要语料库来处理在新闻文章等合成文本中发现的各种结构。对话方面需要关注人与人之间的互动,以及即兴创作的材料,而不是预先编写的材料。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Aravind Joshi其他文献

Cogniac: a discourse processing engine
Cogniac:话语处理引擎
  • DOI:
  • 发表时间:
    1995
  • 期刊:
  • 影响因子:
    0
  • 作者:
    F. B. Baldwin;Aravind Joshi
  • 通讯作者:
    Aravind Joshi
Quantum Circuit Optimization of Arithmetic circuits using ZX Calculus
使用 ZX 微积分对算术电路进行量子电路优化
  • DOI:
    10.48550/arxiv.2306.02264
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Aravind Joshi;Akshara Kairali;Renju Raju;A. Athreya;R. Monica;Sanjay Vishwakarma;Srinjoy Ganguly
  • 通讯作者:
    Srinjoy Ganguly

Aravind Joshi的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Aravind Joshi', 18)}}的其他基金

CI: ADDO-EN: Significant Enhancement of the Exisitng Penn Discourse Treebank
CI:ADDO-EN:现有宾夕法尼亚大学话语树库的显着增强
  • 批准号:
    1059353
  • 财政年份:
    2011
  • 资助金额:
    $ 99.78万
  • 项目类别:
    Standard Grant
RI: Exploiting and Exploring Discourse Connectivity: Deriving New Technology and Knowledge from the Penn Discourse Treebank
RI:利用和探索话语连通性:从宾夕法尼亚大学话语树库中获取新技术和知识
  • 批准号:
    0705671
  • 财政年份:
    2007
  • 资助金额:
    $ 99.78万
  • 项目类别:
    Continuing Grant
Metagrammatical Knowledge for Grammars and Corpora
语法和语料库的元语法知识
  • 批准号:
    0414409
  • 财政年份:
    2004
  • 资助金额:
    $ 99.78万
  • 项目类别:
    Continuing Grant
ITR: Mining the Bibliome -- Information Extraction from the Biomedical Literature
ITR:挖掘文献库——从生物医学文献中提取信息
  • 批准号:
    0205448
  • 财政年份:
    2002
  • 资助金额:
    $ 99.78万
  • 项目类别:
    Continuing Grant
ITR: Language, Learning, and Modeling Biological Sequences
ITR:语言、学习和生物序列建模
  • 批准号:
    0205456
  • 财政年份:
    2002
  • 资助金额:
    $ 99.78万
  • 项目类别:
    Continuing Grant
Constructing Science: Materials and Activities for Kindergarten and First-Grade
构建科学:幼儿园和一年级的材料和活动
  • 批准号:
    9252885
  • 财政年份:
    1992
  • 资助金额:
    $ 99.78万
  • 项目类别:
    Continuing Grant
Research in Natural Language Processing: Mathematical and Computational Investigations in Constrained Grammatical Formalisms
自然语言处理研究:受限语法形式主义的数学和计算研究
  • 批准号:
    9016592
  • 财政年份:
    1991
  • 资助金额:
    $ 99.78万
  • 项目类别:
    Continuing grant
Center for Research in Cognitive Science
认知科学研究中心
  • 批准号:
    8920230
  • 财政年份:
    1991
  • 资助金额:
    $ 99.78万
  • 项目类别:
    Cooperative Agreement
Natural Language Processing (Computer Research)
自然语言处理(计算机研究)
  • 批准号:
    8410413
  • 财政年份:
    1984
  • 资助金额:
    $ 99.78万
  • 项目类别:
    Continuing grant
Modelling Interactive Processes: Flexible Communication With Knowledge Bases
交互过程建模:与知识库的灵活通信
  • 批准号:
    8219196
  • 财政年份:
    1983
  • 资助金额:
    $ 99.78万
  • 项目类别:
    Continuing Grant

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
Cell Research
  • 批准号:
    31224802
  • 批准年份:
    2012
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research
  • 批准号:
    31024804
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research (细胞研究)
  • 批准号:
    30824808
  • 批准年份:
    2008
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
  • 批准号:
    10774081
  • 批准年份:
    2007
  • 资助金额:
    45.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: CISE-MSI: RPEP: CPS: A Resilient Cyber-Physical Security Framework for Next-Generation Distributed Energy Resources at Grid Edge
合作研究:CISE-MSI:RPEP:CPS:电网边缘下一代分布式能源的弹性网络物理安全框架
  • 批准号:
    2219733
  • 财政年份:
    2022
  • 资助金额:
    $ 99.78万
  • 项目类别:
    Standard Grant
Collaborative Research: CISE-MSI: RPEP: CPS: A Resilient Cyber-Physical Security Framework for Next-Generation Distributed Energy Resources at Grid Edge
合作研究:CISE-MSI:RPEP:CPS:电网边缘下一代分布式能源的弹性网络物理安全框架
  • 批准号:
    2219734
  • 财政年份:
    2022
  • 资助金额:
    $ 99.78万
  • 项目类别:
    Standard Grant
CISE Research Resources: Matching Advanced Visualization and Intelligent Data Mining to High-Performance Experimental Networks
CISE 研究资源:将高级可视化和智能数据挖掘与高性能实验网络相匹配
  • 批准号:
    0224306
  • 财政年份:
    2002
  • 资助金额:
    $ 99.78万
  • 项目类别:
    Continuing Grant
CISE Research Resources: Resources for Software Engineering Research
CISE 研究资源:软件工程研究资源
  • 批准号:
    0224368
  • 财政年份:
    2002
  • 资助金额:
    $ 99.78万
  • 项目类别:
    Continuing Grant
CISE Research Resources: Instrumentation for Experimental Research in Machine Learning, Collaborative Filtering, and Virtual Environments
CISE 研究资源:机器学习、协同过滤和虚拟环境实验研究仪器
  • 批准号:
    0224012
  • 财政年份:
    2002
  • 资助金额:
    $ 99.78万
  • 项目类别:
    Standard Grant
CISE Research Resources: Collaborative Research Resources: Collaborative Data Analysis and Visualization
CISE 研究资源:协作研究资源:协作数据分析和可视化
  • 批准号:
    0224424
  • 财政年份:
    2002
  • 资助金额:
    $ 99.78万
  • 项目类别:
    Continuing Grant
CISE Research Resources: R4: Rescue Robots for Research and Response
CISE 研究资源:R4:用于研究和响应的救援机器人
  • 批准号:
    0224401
  • 财政年份:
    2002
  • 资助金额:
    $ 99.78万
  • 项目类别:
    Continuing Grant
CISE Research Resources: Teams of Miniature Mobile Robots
CISE 研究资源:微型移动机器人团队
  • 批准号:
    0224363
  • 财政年份:
    2002
  • 资助金额:
    $ 99.78万
  • 项目类别:
    Continuing Grant
CISE Research Resources: Instrumentation Support for Very Large Data Stores
CISE 研究资源:超大型数据存储的仪器支持
  • 批准号:
    0224439
  • 财政年份:
    2002
  • 资助金额:
    $ 99.78万
  • 项目类别:
    Standard Grant
CISE Research Resources: Infrastructure for Research in Parallel and Distributed Computing
CISE 研究资源:并行和分布式计算研究基础设施
  • 批准号:
    0224469
  • 财政年份:
    2002
  • 资助金额:
    $ 99.78万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了