Creating anaphorically annotated resources through semantic wikis (AnaWiki)
通过语义 wiki 创建照应注释资源 (AnaWiki)
基本信息
- 批准号:EP/F00575X/1
- 负责人:
- 金额:$ 18.26万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2007
- 资助国家:英国
- 起止时间:2007 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The ability to make progress in Natural Language Processing - both to develop better NLP systems and to develop better theories of how humans process language - depends on the availability of large annotated corpora: collections of documents annotated with human judgments about, say, what is the interpretation of ambiguous words such as 'bank' or 'stock' in a particular context, or what is the interpretation of anaphoric expressions like 'the corpus'. So the fact that current corpora annotated for semantic information are not large enough and do not collect the judgments of a large enough number of subjects is a major obstacle for NLP. Creating larger hand-annotated corpora with the current methods, however, is very expensive and time consuming; in practice, it is unfeasible to think of annotating more than 1M words. A variety of techniques for solving the problem by semi-automatic annotation have been proposed in the literature, such as bootstrapping and active learning; however, their usefulness has not yet been convincingly demonstrated. However, the success of Wikipedia shows that another approach might be possible: take advantage of the willingness of the Web population to contribute in collaborative resource creation efforts. This willingness has already been harnessed to tag images through the ESP game; we propose to develop tools that will make it possible for large numbers of volunteers over the Web to collaborate in the creation of semantically annotated corpora (specifically, of a corpus annotated with coreference information) . In this, we will build on existing efforts to develop versions of MediaWiki to support work on the Semantic Web, and on our own to develop reliable and easy-to-follow instructions for marking semantic judgments about anaphora. At the very least, these tools will make it possible for the community of NLP researchers themselves to collaborate in the creation of an Anaphoric Bank. We will however also run a pilot developing methods to attract the interest of the Web community at large; if these tests are successful, we may be able to use the power of collaborative effort through the Web to create really large annotated corpora. A distinctive feature of the approach we will adopt is that we will allow volunteers to mark differences in semantic judgments, and to express comments on previously expressed semantic judgments, so as to identify those judgments on which there is wide agreement and ones on which there is disagreement.
在自然语言处理方面取得进展的能力--开发更好的NLP系统和开发更好的人类如何处理语言的理论--取决于大型注释语料库的可用性:一系列的文件,上面标注了人类的判断,比如,在特定的上下文中,“银行”或“股票”等模糊的词的解释是什么,或者像“语料库”这样的回指表达的解释是什么。因此,目前标注语义信息的语料库不够大,不能收集足够多的主题的判断,这是自然语言处理的一个主要障碍。然而,使用当前的方法创建更大的手工注释语料库是非常昂贵和耗时的;在实践中,考虑注释超过100万个单词是不可行的。在文献中已经提出了各种通过半自动注释来解决这个问题的技术,例如自举和主动学习;然而,它们的有用性尚未得到令人信服的证明。然而,维基百科的成功表明,另一种方法是可能的:利用网络人口的意愿,以促进合作资源的创建工作。这种意愿已经被利用来标记图像通过ESP游戏,我们建议开发工具,这将使大量的志愿者在Web上合作创建语义注释的语料库(特别是,语料库注释与共指信息)。在这方面,我们将在现有努力的基础上开发MediaWiki的版本,以支持语义Web上的工作,并自行开发可靠且易于遵循的指示,用于标记有关回指的语义判断。至少,这些工具将使NLP研究人员自己的社区有可能合作创建一个照应银行。然而,我们也将运行一个试点开发方法,以吸引广大的Web社区的兴趣;如果这些测试是成功的,我们可能能够通过Web使用协作努力的力量来创建真正大的注释语料库。我们将采用的方法的一个显着特点是,我们将允许志愿者标记语义判断的差异,并对先前表达的语义判断发表评论,以确定哪些判断有广泛的共识,哪些有分歧。
项目成果
期刊论文数量(6)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Optimising crowdsourcing efficiency: Amplifying human computation with validation
优化众包效率:通过验证放大人工计算
- DOI:10.1515/itit-2017-0020
- 发表时间:2018
- 期刊:
- 影响因子:0
- 作者:Chamberlain J
- 通讯作者:Chamberlain J
A demonstration of human computation using the Phrase Detectives annotation game
- DOI:10.1145/1600150.1600156
- 发表时间:2009-06
- 期刊:
- 影响因子:0
- 作者:Jon Chamberlain;Massimo Poesio;Udo Kruschwitz
- 通讯作者:Jon Chamberlain;Massimo Poesio;Udo Kruschwitz
Aggregating crowdsourced and automatic judgments to scale up a corpus of anaphoric reference for fiction and Wikipedia texts
聚合众包和自动判断,以扩大小说和维基百科文本的照应参考语料库
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Yu, J
- 通讯作者:Yu, J
ANAWIKI: Creating anaphorically annotated resources through Web cooperation
ANAWIKI:通过网络合作创建照应注释资源
- DOI:
- 发表时间:2008
- 期刊:
- 影响因子:0
- 作者:Poesio M.
- 通讯作者:Poesio M.
Addressing the resource bottleneck to create large-scale annotated texts
解决创建大规模注释文本的资源瓶颈
- DOI:
- 发表时间:2008
- 期刊:
- 影响因子:0
- 作者:Chamberlain J.
- 通讯作者:Chamberlain J.
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Massimo Poesio其他文献
Justified Sloppiness In Anaphoric Reference
照应参考中合理的草率
- DOI:
10.1007/978-1-4020-5958-2_2 - 发表时间:
2008 - 期刊:
- 影响因子:1.5
- 作者:
Massimo Poesio;Uwe Reyle;R. Stevenson - 通讯作者:
R. Stevenson
The provision of corrective feedback in a spoken dialogue CALL system
在语音对话 CALL 系统中提供纠正反馈
- DOI:
10.21437/icslp.1998-82 - 发表时间:
1998 - 期刊:
- 影响因子:0
- 作者:
Sarah Davies;Massimo Poesio - 通讯作者:
Massimo Poesio
Bias decreases in proportion to the number of annotators
偏差与注释者数量成比例减少
- DOI:
- 发表时间:
2005 - 期刊:
- 影响因子:0
- 作者:
Ron Artstein;Massimo Poesio - 通讯作者:
Massimo Poesio
Ambiguity, Underspecification and Discourse Interpretation
歧义、不明确和话语解释
- DOI:
- 发表时间:
2007 - 期刊:
- 影响因子:0
- 作者:
Massimo Poesio - 通讯作者:
Massimo Poesio
State-of-the-art NLP Approaches to Coreference Resolution: Theory and Practical Recipes
最先进的 NLP 共指消解方法:理论与实践秘诀
- DOI:
- 发表时间:
2009 - 期刊:
- 影响因子:0
- 作者:
Simone Paolo Ponzetto;Massimo Poesio - 通讯作者:
Massimo Poesio
Massimo Poesio的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Massimo Poesio', 18)}}的其他基金
Annotating Reference and Coreference In Dialogue Using Conversational Agents in games
在游戏中使用对话代理注释对话中的参考和共指
- 批准号:
EP/W001632/1 - 财政年份:2022
- 资助金额:
$ 18.26万 - 项目类别:
Research Grant