CI-P: Towards the Creation of a Unified Repository for MultiLingual and CrossLingual Multiword Expressions
CI-P:为多语言和跨语言多词表达式创建统一存储库
基本信息
- 批准号:1513116
- 负责人:
- 金额:$ 10万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2015
- 资助国家:美国
- 起止时间:2015-06-15 至 2017-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Single concepts that cross word boundaries, such as kick the bucket and traffic light, are typically referred to as multiword expressions (MWE). MWE usage is pervasive in natural languages. MWEs pose a significant challenge from a processing perspective. Explicitly identifying, classifying, and modeling MWEs in text is shown to improve NLP technologies. The objective of this proposal is to conduct exploratory research and build consensus towards the creation of a Unified Repository for Multilingual and Cross-lingual MWE cutting across different languages (Arabic, Chinese, English, Spanish, Persian), genres and domains while adding contextual references and links to existing lexical resources. Understanding the space of MWE across different languages will have a significant impact on multilingual and cross-lingual NLP applications as well as studies in Linguistics in general. Such a large-scale resource will provide for a disciplined investigation into language universals and language-specific studies by adding insights into shared and varying cultural and language concepts as grounds for extensive typological and etymological studies of how concepts are formed across peoples. Multiword Expressions (MWE) occupy a significant portion of the semantic space. With deeper understanding of MWE, their nature, behavior and usage, natural language processing (NLP) practitioners can build more robust systems to achieve the goal of Natural Language Understanding (NLU). Looking at a diverse set of languages simultaneously would lead to more insights into language universals and language specific phenomena having profound impact in how we design overall natural language solutions. The objective of this proposal is to conduct exploratory research and build consensus towards the creation of a Unified Repository for Multilingual and Cross-lingual MWE cutting across different genres and domains while adding contextual references and links to existing lexical resources such as WordNet. The vision is for a repository that is consistently annotated with links across 5 different languages: Arabic, Chinese, English, Spanish, and Persian. This grant will support the following: i) investigating universal linguistic information that is common across MWEs in various languages as well as identifying differentiating features that are language-specific or found in typologically related languages; ii) conducting pilot annotation studies on essential annotation tasks; iii) building the basic infrastructure for harvesting, storing, and annotating the data; iv) developing tools to bootstrap the envisioned resource from existing resources; and, v) Organizing two 2-day workshops to gather input from leading researchers in the field and to build consensus on key issues for such a resource.
跨越单词边界的单个概念(例如“kick the bucket”和“交通灯”)通常称为多单词表达式 (MWE)。 MWE 的使用在自然语言中非常普遍。从处理的角度来看,MWE 提出了重大挑战。显式识别、分类和建模文本中的 MWE 可以改进 NLP 技术。 该提案的目标是进行探索性研究并建立共识,为跨不同语言(阿拉伯语、中文、英语、西班牙语、波斯语)、流派和领域的多语言和跨语言 MWE 创建统一存储库,同时添加上下文引用和现有词汇资源的链接。了解 MWE 跨不同语言的空间将对多语言和跨语言 NLP 应用以及一般语言学研究产生重大影响。如此大规模的资源将通过增加对共享和不同文化和语言概念的见解,作为对不同民族概念如何形成的广泛类型学和词源学研究的基础,为语言共性和特定语言研究提供严谨的调查。多词表达式 (MWE) 占据了语义空间的很大一部分。随着对 MWE 及其本质、行为和用法的更深入了解,自然语言处理(NLP)从业者可以构建更强大的系统来实现自然语言理解(NLU)的目标。同时研究多种语言将有助于深入了解语言共性和语言特定现象,这对我们设计整体自然语言解决方案的方式产生深远影响。该提案的目标是进行探索性研究并建立共识,为跨不同流派和领域的多语言和跨语言 MWE 创建统一存储库,同时添加上下文引用和现有词汇资源(例如 WordNet)的链接。我们的愿景是建立一个带有 5 种不同语言链接的一致注释的存储库:阿拉伯语、中文、英语、西班牙语和波斯语。 这笔赠款将支持以下内容:i)调查各种语言的 MWE 中常见的通用语言信息,以及识别特定于语言或在类型相关的语言中发现的差异化特征; ii) 对基本注释任务进行试点注释研究; iii) 构建用于收集、存储和注释数据的基础设施; iv) 开发工具以从现有资源中引导预期资源; v) 组织两次为期 2 天的研讨会,收集该领域领先研究人员的意见,并就此类资源的关键问题达成共识。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Mona Diab其他文献
Improving Coherence of Language Model Generation with Latent Semantic State
提高语言模型生成与潜在语义状态的一致性
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Amanda Askell;Yuntao Bai;Anna Chen;Dawn Drain;Deep Ganguli;T. Henighan;Andy Jones;Benjamin Mann;Nova Dassarma;Nelson El;Zac Hatfield;Danny Hernandez;John Kernion;Kamal Ndousse;Catherine Olsson;Dario Amodei;Tom Brown;J. Clark;Sam Mc;Chris Olah;Jared Kaplan;Nick Ryder;Jared D Subbiah;Prafulla Kaplan;A. Dhariwal;P. Neelakantan;Girish Shyam;Amanda Sastry;Sandhini Askell;Ariel Agarwal;Herbert;Gretchen Krueger;R. Child;Aditya Ramesh;Daniel M. Ziegler;Jeffrey Wu;Christopher Winter;Mark Hesse;Eric Chen;Mateusz Sigler;Scott teusz Litwin;Benjamin Gray;Jack Chess;Christopher Clark;Sam Berner;Alec McCandlish;Ilya Radford;Sutskever Dario;Amodei;Joshua Maynez;Shashi Narayan;Bernd Bohnet;Kurt Shuster;Spencer Poff;Moya Chen;Douwe Kiela;Shane Storks;Qiaozi Gao;Yichi Zhang;Joyce Chai;Niket Tandon;Keisuke Sakaguchi;Bhavana Dalvi;Dheeraj Rajagopal;Peter Clark;Michal Guerquin;Kyle Richardson;Eduard H. Hovy;A. Dataset;Rowan Zellers;Ari Holtzman;Matthew E. Peters;Roozbeh Mottaghi;Aniruddha Kembhavi;Ali Farhadi;Chunting Zhou;Graham Neubig;Jiatao Gu;Mona Diab;Francisco Guzmán;Luke Zettlemoyer - 通讯作者:
Luke Zettlemoyer
Investigating Cultural Alignment of Large Language Models
研究大型语言模型的文化一致性
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Badr AlKhamissi;Muhammad N. ElNokrashy;Mai AlKhamissi;Mona Diab - 通讯作者:
Mona Diab
Arabic natural language processing for Qur’anic research: a systematic review
- DOI:
10.1007/s10462-022-10313-2 - 发表时间:
2022-12-02 - 期刊:
- 影响因子:13.900
- 作者:
Muhammad Huzaifa Bashir;Aqil M. Azmi;Haq Nawaz;Wajdi Zaghouani;Mona Diab;Ala Al-Fuqaha;Junaid Qadir - 通讯作者:
Junaid Qadir
Combining Discrete Wavelet and Cosine Transforms for Efficient Sentence Embedding
结合离散小波和余弦变换实现高效句子嵌入
- DOI:
10.5121/csit.2024.141006 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
R. Salama;Abdou Youssef;Mona Diab - 通讯作者:
Mona Diab
Author Correction: Arabic natural language processing for Qur’anic research: a systematic review
- DOI:
10.1007/s10462-023-10390-x - 发表时间:
2023-03-24 - 期刊:
- 影响因子:13.900
- 作者:
Muhammad Huzaifa Bashir;Aqil M. Azmi;Haq Nawaz;Wajdi Zaghouani;Mona Diab;Ala Al-Fuqaha;Junaid Qadir - 通讯作者:
Junaid Qadir
Mona Diab的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Mona Diab', 18)}}的其他基金
CI-ADDO-NEW: Collaborative Research: A Repository for Annotating Multilingual Code Switched Data
CI-ADDO-NEW:协作研究:用于注释多语言代码交换数据的存储库
- 批准号:
1343530 - 财政年份:2013
- 资助金额:
$ 10万 - 项目类别:
Standard Grant
CI-ADDO-NEW: Collaborative Research: A Repository for Annotating Multilingual Code Switched Data
CI-ADDO-NEW:协作研究:用于注释多语言代码交换数据的存储库
- 批准号:
1205556 - 财政年份:2012
- 资助金额:
$ 10万 - 项目类别:
Standard Grant
Collaborative Research: CI-P: Creation of an annotated repository of multilingual and multigenre code switched data for several language pairs
合作研究:CI-P:创建多个语言对的多语言和多流派代码交换数据的带注释存储库
- 批准号:
0958440 - 财政年份:2010
- 资助金额:
$ 10万 - 项目类别:
Standard Grant
SGER: Automatic Processing of Natural Language Code Switching
SGER:自然语言代码切换的自动处理
- 批准号:
0749062 - 财政年份:2007
- 资助金额:
$ 10万 - 项目类别:
Standard Grant
相似海外基金
Towards Embedding Responsible AI in the School System: Co-Creation with Young People
将负责任的人工智能嵌入学校系统:与年轻人共同创造
- 批准号:
AH/Z505560/1 - 财政年份:2024
- 资助金额:
$ 10万 - 项目类别:
Research Grant
Transforming Tourism on East Asian Islands: Towards New Spatial Creation in the Post-Corona Era
东亚岛屿旅游业转型:后电晕时代的新空间创造
- 批准号:
23H03644 - 财政年份:2023
- 资助金额:
$ 10万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
CAREER: Towards the Creation of a Dynamic Modeling Framework to Generate New Knowledge About Swimming Biological Systems
职业:创建动态建模框架以生成有关游泳生物系统的新知识
- 批准号:
2238432 - 财政年份:2023
- 资助金额:
$ 10万 - 项目类别:
Standard Grant
NSF Convergence Accelerator Track H: Towards a Community-Driven Framework for the Creation and Impact Analysis of Digital Accessibility Maps with Persons with Disabilities
NSF 融合加速器轨道 H:迈向社区驱动的框架,用于残疾人数字无障碍地图的创建和影响分析
- 批准号:
2340870 - 财政年份:2023
- 资助金额:
$ 10万 - 项目类别:
Standard Grant
Challenge towards creation of oocyte via parthenogenetically replicated female genome (PRFG)
通过孤雌生殖复制女性基因组(PRFG)创造卵母细胞的挑战
- 批准号:
23K08793 - 财政年份:2023
- 资助金额:
$ 10万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Moving Towards Sustainable Watershed Management: Integrating Environmental Flows into Environmental Assessment through Co-creation with Indigenous Nations
迈向可持续流域管理:通过与土著民族共同创造将环境流量纳入环境评估
- 批准号:
576180-2022 - 财政年份:2022
- 资助金额:
$ 10万 - 项目类别:
Vanier Canada Graduate Scholarship Tri-Council - Doctoral 3 years
Creation of the Novel Interventions Caution (NIC) Measure to Assess Patient Behavior Towards Unregulated Stem Cell Interventions
创建新型干预警告 (NIC) 措施来评估患者对不受监管的干细胞干预的行为
- 批准号:
10684297 - 财政年份:2022
- 资助金额:
$ 10万 - 项目类别:
Creation of the Novel Interventions Caution (NIC) Measure to Assess Patient Behavior Towards Unregulated Stem Cell Interventions
创建新型干预警告 (NIC) 措施来评估患者对不受监管的干细胞干预的行为
- 批准号:
10507963 - 财政年份:2022
- 资助金额:
$ 10万 - 项目类别:
NSF Convergence Accelerator Track H: Towards a Community-Driven Framework for the Creation and Impact Analysis of Digital Accessibility Maps with Persons with Disabilities
NSF 融合加速器轨道 H:迈向社区驱动的框架,用于残疾人数字无障碍地图的创建和影响分析
- 批准号:
2235944 - 财政年份:2022
- 资助金额:
$ 10万 - 项目类别:
Standard Grant
Development of Next-generation Semi-Structured Data Mining Technology Towards The Real-World Knowledge Creation Infrastructure
面向现实世界知识创造基础设施的下一代半结构化数据挖掘技术的开发
- 批准号:
20H00595 - 财政年份:2020
- 资助金额:
$ 10万 - 项目类别:
Grant-in-Aid for Scientific Research (A)














{{item.name}}会员




