CISE Research Resources: Discourse Penn Treebank and Multimodal FORM: Development of Two Richly Annotated Corpora
CISE 研究资源:Discourse Penn Treebank 和 Multimodal FORM:两个注释丰富的语料库的开发
基本信息
- 批准号:0224417
- 负责人:
- 金额:$ 99.78万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2002
- 资助国家:美国
- 起止时间:2002-10-15 至 2006-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
EIA-0224417Aravind K. JoshiMark LibermanUniversity of PennsylvaniaCISE RR: Discourse Penn Trebank and Multimodal FORM: Development of Two Richly Annotated CorporaThis project, providing critical resources for research discourse modeling and conversational interaction, aims at developing new technologies and systems for information retrieval and human computer interaction. Centering on the construction of annotated corpora, two large-scale resources, one in the discourse domain and one in the dialog domain will be built:1. Discourse Penn Treebank (DPTB) and2. MultiFORM: Augmenting the FORM corpus with body movements, speech, and intonation.The former project develops a large scale and reliably annotated corpus that will encode coherence relations associated with discourse connectives, including their argument structure and anaphoric links, thus exposing a clearly defined level of discourse structure and supporting the extraction of a range of inferences associated with discourse connectives. This annotation will be "on top of" the Penn Treebank (PTB) annotations as well as the predicate-argument annotations of PTB (called the Proposition Bank or Prop Bank). The latter involves a corpus of gesture-annotated videos, FORM that was designed to be extensible in order to eventually represent the entire multimodal experience of conversational interaction. This multimodal FORM , MultiFORM, will be created by adding body movement, speech and syntactic structure, and intonation. Large-scale annotated corpora have played a critical role in speech and natural language research by enabling large-scale integration of statistical knowledge (derived from the corpora) with linguistic knowledge (as represented in annotations) leading to scientific and technological advances. Representative examples constitute robust parsing and automatic extraction of relations and coreferences and their applications to information extraction, question answering, summarization, and machine translation. PTB, a resource developed a decade ago, represents an example of such a resource that impacts natural language processing worldwide. PTB deals with corpora at the sentence level warranting a new large scale and reliable discourse and dialog structure annotated corpora. Although intellectual and practical connections exist between studies of the structures of discourse and dialog, the initial requirements for resources to study these areas diverge while overlapping in conception. On the discourse side, we need for corpora that deals with the kinds of structures found in composed text such as journalistic articles. The dialog side needs to focus on interactions among people and on extemporized rather than pre-composed material.
EIA-0224417 Aravind K. JoshiMark Liberman宾夕法尼亚大学CISE RR:Discourse Penn Trebank and Multimodal FORM:Development of Two Richly Annotated CorporaThis project,provides critical resources for research discourse modeling and conversational interaction,aims to develop new technologies and systems for information retrieval and human computer interaction. 围绕着标注语料库的建设,我们将建立两个大规模的语料库,一个是语篇领域的语料库,一个是对话领域的语料库。话语宾州树库(DPTB)和2. MultiFORM:用肢体动作、语音和语调扩充FORM语料库。前者开发了一个大规模的、注释可靠的语料库,它将编码与话语联系语相关的连贯关系,包括它们的论元结构和照应联系,从而揭示一个清晰定义的话语结构层次,并支持提取与话语联系语相关的一系列推理。 该注释将位于Penn Treebank(PTB)注释以及PTB的谓词-论元注释(称为命题库或Prop库)之上。 后者涉及一个手势注释的视频语料库,FORM被设计为可扩展的,以最终代表整个多模式的会话交互体验。 这种多模态形式,多形式,将通过添加身体运动,语音和句法结构,和语调。 大规模注释语料库在语音和自然语言研究中发挥了关键作用,使统计知识(来自语料库)与语言知识(如注释中所表示的)大规模整合,从而导致科学和技术进步。 代表性的例子构成强大的分析和自动提取的关系和共指及其应用信息提取,问答,摘要和机器翻译。 PTB是十年前开发的一种资源,它代表了这种影响全球自然语言处理的资源的一个例子。 PTB在句子层面上处理语料库,提供一个新的大规模、可靠的话语和对话结构注释语料库。 虽然话语结构和对话结构的研究之间存在着知识和实践的联系,但研究这些领域的资源的初始要求在概念上重叠的同时也存在分歧。 在语篇方面,我们需要语料库来处理新闻文章等写作文本中的各种结构。 对话方需要关注人与人之间的互动,以及即兴而不是预先编写的材料。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Aravind Joshi其他文献
Cogniac: a discourse processing engine
Cogniac:话语处理引擎
- DOI:
- 发表时间:
1995 - 期刊:
- 影响因子:0
- 作者:
F. B. Baldwin;Aravind Joshi - 通讯作者:
Aravind Joshi
Quantum Circuit Optimization of Arithmetic circuits using ZX Calculus
使用 ZX 微积分对算术电路进行量子电路优化
- DOI:
10.48550/arxiv.2306.02264 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Aravind Joshi;Akshara Kairali;Renju Raju;A. Athreya;R. Monica;Sanjay Vishwakarma;Srinjoy Ganguly - 通讯作者:
Srinjoy Ganguly
Aravind Joshi的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Aravind Joshi', 18)}}的其他基金
CI: ADDO-EN: Significant Enhancement of the Exisitng Penn Discourse Treebank
CI:ADDO-EN:现有宾夕法尼亚大学话语树库的显着增强
- 批准号:
1059353 - 财政年份:2011
- 资助金额:
$ 99.78万 - 项目类别:
Standard Grant
RI: Exploiting and Exploring Discourse Connectivity: Deriving New Technology and Knowledge from the Penn Discourse Treebank
RI:利用和探索话语连通性:从宾夕法尼亚大学话语树库中获取新技术和知识
- 批准号:
0705671 - 财政年份:2007
- 资助金额:
$ 99.78万 - 项目类别:
Continuing Grant
Metagrammatical Knowledge for Grammars and Corpora
语法和语料库的元语法知识
- 批准号:
0414409 - 财政年份:2004
- 资助金额:
$ 99.78万 - 项目类别:
Continuing Grant
ITR: Mining the Bibliome -- Information Extraction from the Biomedical Literature
ITR:挖掘文献库——从生物医学文献中提取信息
- 批准号:
0205448 - 财政年份:2002
- 资助金额:
$ 99.78万 - 项目类别:
Continuing Grant
ITR: Language, Learning, and Modeling Biological Sequences
ITR:语言、学习和生物序列建模
- 批准号:
0205456 - 财政年份:2002
- 资助金额:
$ 99.78万 - 项目类别:
Continuing Grant
Constructing Science: Materials and Activities for Kindergarten and First-Grade
构建科学:幼儿园和一年级的材料和活动
- 批准号:
9252885 - 财政年份:1992
- 资助金额:
$ 99.78万 - 项目类别:
Continuing Grant
Research in Natural Language Processing: Mathematical and Computational Investigations in Constrained Grammatical Formalisms
自然语言处理研究:受限语法形式主义的数学和计算研究
- 批准号:
9016592 - 财政年份:1991
- 资助金额:
$ 99.78万 - 项目类别:
Continuing grant
Center for Research in Cognitive Science
认知科学研究中心
- 批准号:
8920230 - 财政年份:1991
- 资助金额:
$ 99.78万 - 项目类别:
Cooperative Agreement
Natural Language Processing (Computer Research)
自然语言处理(计算机研究)
- 批准号:
8410413 - 财政年份:1984
- 资助金额:
$ 99.78万 - 项目类别:
Continuing grant
Modelling Interactive Processes: Flexible Communication With Knowledge Bases
交互过程建模:与知识库的灵活通信
- 批准号:
8219196 - 财政年份:1983
- 资助金额:
$ 99.78万 - 项目类别:
Continuing Grant
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
Cell Research
- 批准号:31224802
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research
- 批准号:31024804
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research (细胞研究)
- 批准号:30824808
- 批准年份:2008
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
- 批准号:10774081
- 批准年份:2007
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: CISE-MSI: RPEP: CPS: A Resilient Cyber-Physical Security Framework for Next-Generation Distributed Energy Resources at Grid Edge
合作研究:CISE-MSI:RPEP:CPS:电网边缘下一代分布式能源的弹性网络物理安全框架
- 批准号:
2219733 - 财政年份:2022
- 资助金额:
$ 99.78万 - 项目类别:
Standard Grant
Collaborative Research: CISE-MSI: RPEP: CPS: A Resilient Cyber-Physical Security Framework for Next-Generation Distributed Energy Resources at Grid Edge
合作研究:CISE-MSI:RPEP:CPS:电网边缘下一代分布式能源的弹性网络物理安全框架
- 批准号:
2219734 - 财政年份:2022
- 资助金额:
$ 99.78万 - 项目类别:
Standard Grant
CISE Research Resources: Matching Advanced Visualization and Intelligent Data Mining to High-Performance Experimental Networks
CISE 研究资源:将高级可视化和智能数据挖掘与高性能实验网络相匹配
- 批准号:
0224306 - 财政年份:2002
- 资助金额:
$ 99.78万 - 项目类别:
Continuing Grant
CISE Research Resources: Resources for Software Engineering Research
CISE 研究资源:软件工程研究资源
- 批准号:
0224368 - 财政年份:2002
- 资助金额:
$ 99.78万 - 项目类别:
Continuing Grant
CISE Research Resources: Instrumentation for Experimental Research in Machine Learning, Collaborative Filtering, and Virtual Environments
CISE 研究资源:机器学习、协同过滤和虚拟环境实验研究仪器
- 批准号:
0224012 - 财政年份:2002
- 资助金额:
$ 99.78万 - 项目类别:
Standard Grant
CISE Research Resources: Collaborative Research Resources: Collaborative Data Analysis and Visualization
CISE 研究资源:协作研究资源:协作数据分析和可视化
- 批准号:
0224424 - 财政年份:2002
- 资助金额:
$ 99.78万 - 项目类别:
Continuing Grant
CISE Research Resources: R4: Rescue Robots for Research and Response
CISE 研究资源:R4:用于研究和响应的救援机器人
- 批准号:
0224401 - 财政年份:2002
- 资助金额:
$ 99.78万 - 项目类别:
Continuing Grant
CISE Research Resources: Teams of Miniature Mobile Robots
CISE 研究资源:微型移动机器人团队
- 批准号:
0224363 - 财政年份:2002
- 资助金额:
$ 99.78万 - 项目类别:
Continuing Grant
CISE Research Resources: Instrumentation Support for Very Large Data Stores
CISE 研究资源:超大型数据存储的仪器支持
- 批准号:
0224439 - 财政年份:2002
- 资助金额:
$ 99.78万 - 项目类别:
Standard Grant
CISE Research Resources: Infrastructure for Research in Parallel and Distributed Computing
CISE 研究资源:并行和分布式计算研究基础设施
- 批准号:
0224469 - 财政年份:2002
- 资助金额:
$ 99.78万 - 项目类别:
Standard Grant














{{item.name}}会员




