CI-NEW: NIEUW: Novel Incentives and Workflows in Linguistic Data Collection and Annotation
CI-NEW:NIEUW:语言数据收集和注释中的新颖激励措施和工作流程
基本信息
- 批准号:1730377
- 负责人:
- 金额:$ 121.85万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-07-15 至 2023-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Language touches every aspect of human life. People speak and write in order to manage relationships from the personal to the international, to gather and provide information, to negotiate, influence and inspire. Scientists use language to communicate their findings regardless of their field of study. Although researchers have been working for six decades to process language via computer, only in the past several years have their efforts have produced technologies of sufficient maturity that they can affect the lives of the average citizen. Today, some of the most fortunate use computers to search the vast archives of the Internet, to translate material from languages they do not understand into languages they do and to interact with smart devices by giving them natural language commands and queries and receive responses in kind. Despite the growth and promise of human language technologies, they are in fact available for only a tiny portion of the world's approximately 7000 languages and, even then, for only a limited range of situations. This is the case because the approaches that have proven most successful in developing human language technologies require vast amounts of spoken or written language material that have been augmented by human judgment as to their interpretation, but such resources are lacking for most languages and for many types of situations, even for languages of international importance, including English. This Research Infrastructure project will address this shortage of language resources by supporting the language technology research community to employ novel incentives and alternate workflows to greatly expand the methods that have been used to date for collecting and annotating language data. The resulting resources will support research and development on an expanded range of language technologies, leading to the creation and deployment of applications for an increasingly broad range of languages and situations. Even a brief observation of user behavior on social media, online games, citizen science and public good initiatives demonstrates that many people around the world are willing to devote collectively vast amounts of effort when given appropriate motivation and effective tools. This project will harness some of the immense people-power that drives such activities and focus it on problems of developing language resources that help computers learn to process language. Specifically, the project will create a software toolkit to be developed by the project team in response to the needs of language technology researchers to create online activities that yield language resources. The activities will include games, citizen science and tools for language professionals, clustered into a series of portals that appeal to different populations of users. The project will build and maintain the database and web servers, with redundancy, load balancing and fail over, to run the principal instance of all of the activities, and an open-source release of the software will enable other researchers to build their own instances independently. Finally, the data resulting from this project will be shared with the least restrictive terms possible to further support language technology research and development activities worldwide.
语言涉及人类生活的方方面面。人们说话和写作是为了管理从个人到国际的关系,收集和提供信息,谈判,影响和激励。科学家们使用语言来交流他们的发现,无论他们的研究领域。虽然研究人员已经工作了60年通过计算机处理语言,只是在过去的几年里,他们的努力已经产生了足够成熟的技术,他们可以影响普通公民的生活。今天,一些最幸运的人使用计算机搜索互联网上的大量档案,将材料从他们不懂的语言翻译成他们懂的语言,并通过向智能设备发出自然语言命令和查询与智能设备进行交互,并获得相应的回复。尽管人类语言技术的发展和前景,但它们实际上只适用于世界上大约7000种语言中的一小部分,即使如此,也只适用于有限的情况。这是因为在开发人类语言技术方面已经证明最成功的方法需要大量的口头或书面语言材料,这些材料已经通过人类对其解释的判断而得到增强,但是对于大多数语言和许多类型的情况,甚至对于包括英语在内的国际重要性语言,都缺乏这样的资源。该研究基础设施项目将通过支持语言技术研究社区采用新的激励措施和替代工作流程来解决语言资源的短缺问题,从而大大扩展迄今为止用于收集和注释语言数据的方法。由此产生的资源将用于支持更多语言技术的研究和开发,从而为越来越多的语言和情况创建和部署应用程序。即使是对社交媒体、网络游戏、公民科学和公益活动上的用户行为的简短观察也表明,如果给予适当的动机和有效的工具,世界各地的许多人都愿意集体投入大量的努力。这个项目将利用一些巨大的人力来推动这些活动,并将重点放在开发语言资源的问题上,这些资源可以帮助计算机学习处理语言。具体而言,该项目将创建一个软件工具包,由项目小组开发,以满足语言技术研究人员的需要,创建产生语言资源的在线活动。这些活动将包括游戏、公民科学和语言专业人员工具,这些活动将汇集成一系列吸引不同用户群体的门户网站。该项目将建立和维护数据库和Web服务器,具有冗余,负载平衡和故障转移,以运行所有活动的主要实例,该软件的开源版本将使其他研究人员能够独立构建自己的实例。最后,该项目产生的数据将以尽可能少的限制性条款共享,以进一步支持全球语言技术研究和开发活动。
项目成果
期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Using Games to Augment Corpora for Language Recognition and Confusability
使用游戏增强语料库以实现语言识别和混淆
- DOI:10.21437/interspeech.2021-1611
- 发表时间:2021
- 期刊:
- 影响因子:0
- 作者:Cieri, Christopher;Fiumara, James;Wright, Jonathan
- 通讯作者:Wright, Jonathan
LanguageARC: Developing Language Resources Through Citizen Linguistics
LanguageARC:通过公民语言学开发语言资源
- DOI:
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Fiumara, James;Cieri, Christopher;Wright, Jonathan;Liberman, Mark
- 通讯作者:Liberman, Mark
LanguageARC – a tutorial
LanguageARC — 教程
- DOI:
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Cieri, Christopher;Fiumara, James
- 通讯作者:Fiumara, James
Reflections on 30 Years of Language Resource Development and Sharing
语言资源开发与共享30年的思考
- DOI:
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Christopher Cieri, Mark Liberman
- 通讯作者:Christopher Cieri, Mark Liberman
Proceedings of the Workshop on Citizen Linguistics in Language Resource Development (CLLRD 2020)
语言资源开发中的公民语言学研讨会论文集 (CLLRD 2020)
- DOI:
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Fiumara, James;Cieri, Christopher;Liberman, Mark;Callison-Burch, Chris
- 通讯作者:Callison-Burch, Chris
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Mark Liberman其他文献
Dimensions of Speech and Language Disturbance in Psychosis and Computational Linguistic Markers
- DOI:
10.1016/j.biopsych.2022.02.144 - 发表时间:
2022-05-01 - 期刊:
- 影响因子:
- 作者:
Sunny Tang;Katrin Hänsel;Yan Cong;Sarah Berretta;Sunghye Cho;Amir Nikzad;Aarush Mehta;Sameer Pradhan;James Fiumara;Mark Liberman - 通讯作者:
Mark Liberman
Ruptured Appendicitis after Laparoscopic Roux-enY Gastric Bypass: Pitfalls in Diagnosing a Surgical Abdomen in the Morbidly Obese
- DOI:
10.1381/096089203322618812 - 发表时间:
2003-12-01 - 期刊:
- 影响因子:3.100
- 作者:
Amir Mehran;Mark Liberman;Raul Rosenthal;Samuel Szomstein - 通讯作者:
Samuel Szomstein
CLiFF Notes: Research in the Language, Information and Computation Laboratory of the University of Pennsylvania
CLiFF笔记:宾夕法尼亚大学语言、信息和计算实验室的研究
- DOI:
- 发表时间:
1995 - 期刊:
- 影响因子:0
- 作者:
Norm Badler;F. B. Baldwin;Nicola J. Bessell;Eric Brill;Sharon Cote;Barbara Di Eugenio;Alexis Dimitriadis;Jon Freeman;Christopher W. Geib;A. Gertner;Daniel Hardt;Michael Hegarty;Shyam Kapur;Jonathan Kaye;Michael H. Kelly;Libby Levison;Mark Liberman;D. R. Mani;Mitch Marcus Michael;B. Moore;Michael Niv;Charles L. Ortiz;Jong Cheol Park;Sandeep Prasada Scott - 通讯作者:
Sandeep Prasada Scott
l / VARIATION IN AMERICAN ENGLISH : A CORPUS
l / 美式英语变体:语料库
- DOI:
- 发表时间:
2012 - 期刊:
- 影响因子:0
- 作者:
Jiahong Yuan;Mark Liberman - 通讯作者:
Mark Liberman
LOOKING BACK, MOVING FORWARD Why underlying representations? 1
回顾过去,展望未来 为什么要使用底层表征?
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Looking Back;Moving Forward;Larry;M. Hyman;Jeffrey Heinz;Sharon Inkelas;Keith Johnson;Mark Liberman - 通讯作者:
Mark Liberman
Mark Liberman的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Mark Liberman', 18)}}的其他基金
Language Preservation 2.0: Crowdsourcing Oral Language Documentation using Mobile Devices
语言保存2.0:使用移动设备众包口语文档
- 批准号:
1160639 - 财政年份:2012
- 资助金额:
$ 121.85万 - 项目类别:
Standard Grant
Prosodic Systems in New Guinea: Integrating computational and typological approaches to linguistic analysis
新几内亚的韵律系统:将计算和类型学方法整合到语言分析中
- 批准号:
0951651 - 财政年份:2010
- 资助金额:
$ 121.85万 - 项目类别:
Standard Grant
Collaborative Research: OLAC: Accessing the World's Language Resources
合作研究:OLAC:访问世界语言资源
- 批准号:
0723357 - 财政年份:2007
- 资助金额:
$ 121.85万 - 项目类别:
Continuing Grant
ITR-SCOTUS: A Resource for Collaborative Research in Speech Technology, Linguistics, Decision Processes and the Law
ITR-SCOTUS:语音技术、语言学、决策过程和法律合作研究的资源
- 批准号:
0325739 - 财政年份:2003
- 资助金额:
$ 121.85万 - 项目类别:
Continuing Grant
Eletronic Materials For Natural Language Research
用于自然语言研究的电子材料
- 批准号:
9113530 - 财政年份:1991
- 资助金额:
$ 121.85万 - 项目类别:
Standard Grant
相似海外基金
Assessment of new fatigue capable titanium alloys for aerospace applications
评估用于航空航天应用的新型抗疲劳钛合金
- 批准号:
2879438 - 财政年份:2027
- 资助金额:
$ 121.85万 - 项目类别:
Studentship
Development of a new solid tritium breeder blanket
新型固体氚增殖毯的研制
- 批准号:
2908923 - 财政年份:2027
- 资助金额:
$ 121.85万 - 项目类别:
Studentship
Collaborative Research: REU Site: Earth and Planetary Science and Astrophysics REU at the American Museum of Natural History in Collaboration with the City University of New York
合作研究:REU 地点:地球与行星科学和天体物理学 REU 与纽约市立大学合作,位于美国自然历史博物馆
- 批准号:
2348998 - 财政年份:2025
- 资助金额:
$ 121.85万 - 项目类别:
Standard Grant
New approaches to training deep probabilistic models
训练深度概率模型的新方法
- 批准号:
2613115 - 财政年份:2025
- 资助金额:
$ 121.85万 - 项目类别:
Studentship
Collaborative Research: REU Site: Earth and Planetary Science and Astrophysics REU at the American Museum of Natural History in Collaboration with the City University of New York
合作研究:REU 地点:地球与行星科学和天体物理学 REU 与纽约市立大学合作,位于美国自然历史博物馆
- 批准号:
2348999 - 财政年份:2025
- 资助金额:
$ 121.85万 - 项目类别:
Standard Grant
PINK - Provision of Integrated Computational Approaches for Addressing New Markets Goals for the Introduction of Safe-and-Sustainable-by-Design Chemicals and Materials
PINK - 提供综合计算方法来解决引入安全和可持续设计化学品和材料的新市场目标
- 批准号:
10097944 - 财政年份:2024
- 资助金额:
$ 121.85万 - 项目类别:
EU-Funded
Royal Holloway and Bedford New College and Rubberatkins Limited KTP 23_24 R1
皇家霍洛威学院和贝德福德新学院和 Rubberatkins Limited KTP 23_24 R1
- 批准号:
10074401 - 财政年份:2024
- 资助金额:
$ 121.85万 - 项目类别:
Knowledge Transfer Partnership
Removal of Perfluorinated Chemicals Using New Fluorinated Polymer Sorbents
使用新型氟化聚合物吸附剂去除全氟化化学品
- 批准号:
LP220100036 - 财政年份:2024
- 资助金额:
$ 121.85万 - 项目类别:
Linkage Projects
Big time crystals: a new paradigm in condensed matter
大时间晶体:凝聚态物质的新范例
- 批准号:
DP240101590 - 财政年份:2024
- 资助金额:
$ 121.85万 - 项目类别:
Discovery Projects
Data Driven Discovery of New Catalysts for Asymmetric Synthesis
数据驱动的不对称合成新催化剂的发现
- 批准号:
DP240100102 - 财政年份:2024
- 资助金额:
$ 121.85万 - 项目类别:
Discovery Projects