CI-P: Towards the Creation of a Unified Repository for MultiLingual and CrossLingual Multiword Expressions
CI-P:为多语言和跨语言多词表达式创建统一存储库
基本信息
- 批准号:1513116
- 负责人:
- 金额:$ 10万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2015
- 资助国家:美国
- 起止时间:2015-06-15 至 2017-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Single concepts that cross word boundaries, such as kick the bucket and traffic light, are typically referred to as multiword expressions (MWE). MWE usage is pervasive in natural languages. MWEs pose a significant challenge from a processing perspective. Explicitly identifying, classifying, and modeling MWEs in text is shown to improve NLP technologies. The objective of this proposal is to conduct exploratory research and build consensus towards the creation of a Unified Repository for Multilingual and Cross-lingual MWE cutting across different languages (Arabic, Chinese, English, Spanish, Persian), genres and domains while adding contextual references and links to existing lexical resources. Understanding the space of MWE across different languages will have a significant impact on multilingual and cross-lingual NLP applications as well as studies in Linguistics in general. Such a large-scale resource will provide for a disciplined investigation into language universals and language-specific studies by adding insights into shared and varying cultural and language concepts as grounds for extensive typological and etymological studies of how concepts are formed across peoples. Multiword Expressions (MWE) occupy a significant portion of the semantic space. With deeper understanding of MWE, their nature, behavior and usage, natural language processing (NLP) practitioners can build more robust systems to achieve the goal of Natural Language Understanding (NLU). Looking at a diverse set of languages simultaneously would lead to more insights into language universals and language specific phenomena having profound impact in how we design overall natural language solutions. The objective of this proposal is to conduct exploratory research and build consensus towards the creation of a Unified Repository for Multilingual and Cross-lingual MWE cutting across different genres and domains while adding contextual references and links to existing lexical resources such as WordNet. The vision is for a repository that is consistently annotated with links across 5 different languages: Arabic, Chinese, English, Spanish, and Persian. This grant will support the following: i) investigating universal linguistic information that is common across MWEs in various languages as well as identifying differentiating features that are language-specific or found in typologically related languages; ii) conducting pilot annotation studies on essential annotation tasks; iii) building the basic infrastructure for harvesting, storing, and annotating the data; iv) developing tools to bootstrap the envisioned resource from existing resources; and, v) Organizing two 2-day workshops to gather input from leading researchers in the field and to build consensus on key issues for such a resource.
跨字边界的单个概念(例如踢桶和交通信号灯)通常称为多字表达式(MWE)。 MWE的使用在自然语言中无处不在。从处理的角度来看,MWE构成了重大挑战。在文本中明确识别,分类和建模MWE被证明可以改善NLP技术。 该提案的目的是进行探索性研究,并建立共识,以创建一个统一的存储库,用于跨不同语言(阿拉伯语,中文,英语,西班牙语,波斯语),类型和领域的多种语言和跨语性MWE削减,同时添加上下文参考和链接到现有的词汇资源。了解跨不同语言的MWE的空间将对多语言和跨语性NLP应用以及一般语言学研究产生重大影响。这样的大规模资源将通过将共享和改变文化和语言概念的见解作为对跨民族概念如何形成概念的广泛类型学和词源研究的基础,从而为语言普遍性和语言特定研究提供纪律研究。多字表达式(MWE)占据着语义空间的很大一部分。有了更深入的了解MWE,其性质,行为和用法,自然语言处理(NLP)从业人员可以建立更强大的系统以实现自然语言理解的目标(NLU)。同时研究各种语言将导致对语言普遍性和语言特定现象的更多见解,对我们设计整体自然语言解决方案产生深远影响。该提案的目的是进行探索性研究,并建立共识,以创建一个统一的存储库,用于跨不同类型和领域的多语言和跨语性MWE切割,同时添加上下文参考和与WordNet等现有词汇资源的联系。愿景是针对一个始终用跨5种不同语言的链接的存储库:阿拉伯语,中文,英语,西班牙语和波斯语。 该赠款将支持以下内容:i)调查在MWE中使用各种语言中常见的通用语言信息,并识别具有特定语言或类型与类型的语言的不同特征; ii)对基本注释任务进行试点注释研究; iii)建立用于收集,存储和注释数据的基本基础设施; iv)开发工具来引导现有资源中设想的资源; v)组织两个为期2天的研讨会,以收集该领域的主要研究人员的意见,并就此类资源的关键问题建立共识。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Mona Diab其他文献
Combining Discrete Wavelet and Cosine Transforms for Efficient Sentence Embedding
结合离散小波和余弦变换实现高效句子嵌入
- DOI:
10.5121/csit.2024.141006 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
R. Salama;Abdou Youssef;Mona Diab - 通讯作者:
Mona Diab
Improving Coherence of Language Model Generation with Latent Semantic State
提高语言模型生成与潜在语义状态的一致性
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Amanda Askell;Yuntao Bai;Anna Chen;Dawn Drain;Deep Ganguli;T. Henighan;Andy Jones;Benjamin Mann;Nova Dassarma;Nelson El;Zac Hatfield;Danny Hernandez;John Kernion;Kamal Ndousse;Catherine Olsson;Dario Amodei;Tom Brown;J. Clark;Sam Mc;Chris Olah;Jared Kaplan;Nick Ryder;Jared D Subbiah;Prafulla Kaplan;A. Dhariwal;P. Neelakantan;Girish Shyam;Amanda Sastry;Sandhini Askell;Ariel Agarwal;Herbert;Gretchen Krueger;R. Child;Aditya Ramesh;Daniel M. Ziegler;Jeffrey Wu;Christopher Winter;Mark Hesse;Eric Chen;Mateusz Sigler;Scott teusz Litwin;Benjamin Gray;Jack Chess;Christopher Clark;Sam Berner;Alec McCandlish;Ilya Radford;Sutskever Dario;Amodei;Joshua Maynez;Shashi Narayan;Bernd Bohnet;Kurt Shuster;Spencer Poff;Moya Chen;Douwe Kiela;Shane Storks;Qiaozi Gao;Yichi Zhang;Joyce Chai;Niket Tandon;Keisuke Sakaguchi;Bhavana Dalvi;Dheeraj Rajagopal;Peter Clark;Michal Guerquin;Kyle Richardson;Eduard H. Hovy;A. Dataset;Rowan Zellers;Ari Holtzman;Matthew E. Peters;Roozbeh Mottaghi;Aniruddha Kembhavi;Ali Farhadi;Chunting Zhou;Graham Neubig;Jiatao Gu;Mona Diab;Francisco Guzmán;Luke Zettlemoyer - 通讯作者:
Luke Zettlemoyer
Investigating Cultural Alignment of Large Language Models
研究大型语言模型的文化一致性
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Badr AlKhamissi;Muhammad N. ElNokrashy;Mai AlKhamissi;Mona Diab - 通讯作者:
Mona Diab
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients
Grass:使用结构化稀疏梯度计算高效的低内存 LLM 训练
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Aashiq Muhamed;Oscar Li;David Woodruff;Mona Diab;Virginia Smith - 通讯作者:
Virginia Smith
Empirical Evaluation of Topic Zero-and Few-Shot Learning for Stance Dissonance Detection
用于立场失调检测的主题零和少样本学习的实证评估
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Emily Allaway;Malavika Srikanth;Kathleen McK;Samuel R. Bowman;Gabor Angeli;Christopher Potts;Daniel Cer;Mona Diab;Eneko Agirre;Iñigo Lopez - 通讯作者:
Iñigo Lopez
Mona Diab的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Mona Diab', 18)}}的其他基金
CI-ADDO-NEW: Collaborative Research: A Repository for Annotating Multilingual Code Switched Data
CI-ADDO-NEW:协作研究:用于注释多语言代码交换数据的存储库
- 批准号:
1343530 - 财政年份:2013
- 资助金额:
$ 10万 - 项目类别:
Standard Grant
CI-ADDO-NEW: Collaborative Research: A Repository for Annotating Multilingual Code Switched Data
CI-ADDO-NEW:协作研究:用于注释多语言代码交换数据的存储库
- 批准号:
1205556 - 财政年份:2012
- 资助金额:
$ 10万 - 项目类别:
Standard Grant
Collaborative Research: CI-P: Creation of an annotated repository of multilingual and multigenre code switched data for several language pairs
合作研究:CI-P:创建多个语言对的多语言和多流派代码交换数据的带注释存储库
- 批准号:
0958440 - 财政年份:2010
- 资助金额:
$ 10万 - 项目类别:
Standard Grant
SGER: Automatic Processing of Natural Language Code Switching
SGER:自然语言代码切换的自动处理
- 批准号:
0749062 - 财政年份:2007
- 资助金额:
$ 10万 - 项目类别:
Standard Grant
相似国自然基金
SHP2调控Treg向Th2-like Treg的可塑性转化在变应性鼻炎中的作用与机制研究
- 批准号:82301281
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
EAST高极向比压运行模式下芯部与边界兼容机制的数值模拟研究
- 批准号:12375228
- 批准年份:2023
- 资助金额:53 万元
- 项目类别:面上项目
CXCR5依赖的边缘区B细胞向滤泡树突状细胞呈递外泌体引发心脏移植排斥的研究
- 批准号:82300460
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
Dlx2通过调控Tspan13影响上颌突间充质干细胞骨向分化的机制研究
- 批准号:82301008
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
糖尿病心肌病心室重构的新机制:高糖诱导巨噬细胞脂质代谢重编程通过活性脂质MA调控心脏成纤维细胞向肌成纤维细胞转分化的机制研究
- 批准号:82300404
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Towards Embedding Responsible AI in the School System: Co-Creation with Young People
将负责任的人工智能嵌入学校系统:与年轻人共同创造
- 批准号:
AH/Z505560/1 - 财政年份:2024
- 资助金额:
$ 10万 - 项目类别:
Research Grant
Transforming Tourism on East Asian Islands: Towards New Spatial Creation in the Post-Corona Era
东亚岛屿旅游业转型:后电晕时代的新空间创造
- 批准号:
23H03644 - 财政年份:2023
- 资助金额:
$ 10万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
CAREER: Towards the Creation of a Dynamic Modeling Framework to Generate New Knowledge About Swimming Biological Systems
职业:创建动态建模框架以生成有关游泳生物系统的新知识
- 批准号:
2238432 - 财政年份:2023
- 资助金额:
$ 10万 - 项目类别:
Standard Grant
NSF Convergence Accelerator Track H: Towards a Community-Driven Framework for the Creation and Impact Analysis of Digital Accessibility Maps with Persons with Disabilities
NSF 融合加速器轨道 H:迈向社区驱动的框架,用于残疾人数字无障碍地图的创建和影响分析
- 批准号:
2340870 - 财政年份:2023
- 资助金额:
$ 10万 - 项目类别:
Standard Grant
Challenge towards creation of oocyte via parthenogenetically replicated female genome (PRFG)
通过孤雌生殖复制女性基因组(PRFG)创造卵母细胞的挑战
- 批准号:
23K08793 - 财政年份:2023
- 资助金额:
$ 10万 - 项目类别:
Grant-in-Aid for Scientific Research (C)