CI-ADDO-NEW: Collaborative Research: A Repository for Annotating Multilingual Code Switched Data
CI-ADDO-NEW:协作研究:用于注释多语言代码交换数据的存储库
基本信息
- 批准号:1343530
- 负责人:
- 金额:$ 40万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2013
- 资助国家:美国
- 起止时间:2013-02-01 至 2017-05-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Linguistic code switching (LCS) is the practice of switching back and forth between the shared languages of bilingual or multilingual speakers. This phenomenon is particularly prevalent in geographic regions with linguistic boundaries or where there are large immigrant groups. Various levels of language (phonological, morphological, syntactic, semantic and discourse-pragmatic) may be implicated in LCS in different language pairs and/or genres. Computational algorithms trained for a single language quickly break down when the input includes LCS. A major barrier to research on LCS in computational linguistics (CL) has been the lack of large, accurately annotated corpora of LCS data. In this project, a large repository of LCS data is collected and a large annotation infrastructure is developed. It is consistently annotated in different modalities (speech and text), at various levels of linguistic granularity, and across different language pairs reflecting different linguistic typologies (Standard Arabic and Dialectal Arabic, Arabic-English, Spanish-English, Chinese-English, Hindi-English). The focus of the effort is on intra-sentential LCS.This infrastructure and unified large LCS data resource is eagerly awaited by the CL research community, since annotated LCS data provides a natural test-bed for adaptive learning algorithms and the handling of diverse data sources, as well as a framework for genuine multilingual processing. It will also be of benefit to sociolinguistic and theoretical linguistic researchers, and provide a platform for collaborative interdisciplinary research. Finally, research on LCS helps overcome biases against multilingual speakers by demonstrating the creativity of such speakers in exploiting their verbal repertoires. Such a result is particularly important for K-12 education and testing policies in the USA with its diverse immigrant population.
语言代码转换(LCS)是在双语或多语种说话者的共享语言之间来回切换的实践。这种现象在有语言边界的地理区域或有大量移民群体的地区尤为普遍。在不同的语言组合和体裁中,不同层次的语言(语音、形态、句法、语义和语篇-语用)可能牵涉到LCS。当输入包括LC时,为一种语言训练的计算算法很快就会崩溃。计算语言学研究的一个主要障碍是缺乏大量准确标注的词汇化语料库。在这个项目中,收集了大型LCS数据存储库,并开发了大型注释基础设施。它以不同的形式(语音和文本)、不同的语言粒度和反映不同语言类型的不同语言对(标准阿拉伯语和方言阿拉伯语、阿拉伯语-英语、西班牙语-英语、汉语-英语、印地语-英语)进行了一致的注释。这一基础设施和统一的大型LCS数据受到了CL研究界的热切期待,因为带标注的LCS数据为自适应学习算法和处理不同数据源提供了一个天然的试验台,也为真正的多语言处理提供了一个框架。它还将有利于社会语言学和理论语言学的研究,并为跨学科合作研究提供一个平台。最后,对LCS的研究有助于克服对多语种说话者的偏见,因为它展示了这些说话者在利用他们的言语能力时的创造力。这样的结果对于移民人口多元化的美国的K-12教育和考试政策尤为重要。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Mona Diab其他文献
Improving Coherence of Language Model Generation with Latent Semantic State
提高语言模型生成与潜在语义状态的一致性
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Amanda Askell;Yuntao Bai;Anna Chen;Dawn Drain;Deep Ganguli;T. Henighan;Andy Jones;Benjamin Mann;Nova Dassarma;Nelson El;Zac Hatfield;Danny Hernandez;John Kernion;Kamal Ndousse;Catherine Olsson;Dario Amodei;Tom Brown;J. Clark;Sam Mc;Chris Olah;Jared Kaplan;Nick Ryder;Jared D Subbiah;Prafulla Kaplan;A. Dhariwal;P. Neelakantan;Girish Shyam;Amanda Sastry;Sandhini Askell;Ariel Agarwal;Herbert;Gretchen Krueger;R. Child;Aditya Ramesh;Daniel M. Ziegler;Jeffrey Wu;Christopher Winter;Mark Hesse;Eric Chen;Mateusz Sigler;Scott teusz Litwin;Benjamin Gray;Jack Chess;Christopher Clark;Sam Berner;Alec McCandlish;Ilya Radford;Sutskever Dario;Amodei;Joshua Maynez;Shashi Narayan;Bernd Bohnet;Kurt Shuster;Spencer Poff;Moya Chen;Douwe Kiela;Shane Storks;Qiaozi Gao;Yichi Zhang;Joyce Chai;Niket Tandon;Keisuke Sakaguchi;Bhavana Dalvi;Dheeraj Rajagopal;Peter Clark;Michal Guerquin;Kyle Richardson;Eduard H. Hovy;A. Dataset;Rowan Zellers;Ari Holtzman;Matthew E. Peters;Roozbeh Mottaghi;Aniruddha Kembhavi;Ali Farhadi;Chunting Zhou;Graham Neubig;Jiatao Gu;Mona Diab;Francisco Guzmán;Luke Zettlemoyer - 通讯作者:
Luke Zettlemoyer
Investigating Cultural Alignment of Large Language Models
研究大型语言模型的文化一致性
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Badr AlKhamissi;Muhammad N. ElNokrashy;Mai AlKhamissi;Mona Diab - 通讯作者:
Mona Diab
Arabic natural language processing for Qur’anic research: a systematic review
- DOI:
10.1007/s10462-022-10313-2 - 发表时间:
2022-12-02 - 期刊:
- 影响因子:13.900
- 作者:
Muhammad Huzaifa Bashir;Aqil M. Azmi;Haq Nawaz;Wajdi Zaghouani;Mona Diab;Ala Al-Fuqaha;Junaid Qadir - 通讯作者:
Junaid Qadir
Combining Discrete Wavelet and Cosine Transforms for Efficient Sentence Embedding
结合离散小波和余弦变换实现高效句子嵌入
- DOI:
10.5121/csit.2024.141006 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
R. Salama;Abdou Youssef;Mona Diab - 通讯作者:
Mona Diab
Author Correction: Arabic natural language processing for Qur’anic research: a systematic review
- DOI:
10.1007/s10462-023-10390-x - 发表时间:
2023-03-24 - 期刊:
- 影响因子:13.900
- 作者:
Muhammad Huzaifa Bashir;Aqil M. Azmi;Haq Nawaz;Wajdi Zaghouani;Mona Diab;Ala Al-Fuqaha;Junaid Qadir - 通讯作者:
Junaid Qadir
Mona Diab的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Mona Diab', 18)}}的其他基金
CI-P: Towards the Creation of a Unified Repository for MultiLingual and CrossLingual Multiword Expressions
CI-P:为多语言和跨语言多词表达式创建统一存储库
- 批准号:
1513116 - 财政年份:2015
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CI-ADDO-NEW: Collaborative Research: A Repository for Annotating Multilingual Code Switched Data
CI-ADDO-NEW:协作研究:用于注释多语言代码交换数据的存储库
- 批准号:
1205556 - 财政年份:2012
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
Collaborative Research: CI-P: Creation of an annotated repository of multilingual and multigenre code switched data for several language pairs
合作研究:CI-P:创建多个语言对的多语言和多流派代码交换数据的带注释存储库
- 批准号:
0958440 - 财政年份:2010
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
SGER: Automatic Processing of Natural Language Code Switching
SGER:自然语言代码切换的自动处理
- 批准号:
0749062 - 财政年份:2007
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
相似海外基金
CI-ADDO-NEW: Collaborative Research: Development of DARwIn Humanoid Robots for Research, Education and Outreach
CI-ADDO-NEW:协作研究:开发用于研究、教育和推广的 DARwIn 人形机器人
- 批准号:
1564417 - 财政年份:2015
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
CI-ADDO-NEW: Collaborative Research: A Repository for Annotating Multilingual Code Switched Data
CI-ADDO-NEW:协作研究:用于注释多语言代码交换数据的存储库
- 批准号:
1462142 - 财政年份:2014
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CI-ADDO-NEW: Collaborative Research: The Speech Recognition Virtual Kitchen
CI-ADDO-NEW:协作研究:语音识别虚拟厨房
- 批准号:
1305215 - 财政年份:2013
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CI-ADDO-NEW: ASTERIX: A Community Software Platform for Big Data Research, Analysis, and Management
CI-ADDO-NEW:ASTERIX:用于大数据研究、分析和管理的社区软件平台
- 批准号:
1305253 - 财政年份:2013
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CI-ADDO-NEW: Collaborative Research: The Speech Recognition Virtual Kitchen
CI-ADDO-NEW:协作研究:语音识别虚拟厨房
- 批准号:
1305319 - 财政年份:2013
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CI-ADDO-NEW: OCCAM: Open Curation for Computer Architecture Modeling
CI-ADDO-NEW:OCCAM:计算机架构建模的开放式管理
- 批准号:
1305220 - 财政年份:2013
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CI-ADDO-NEW: Collaborative Research: WiSER Dynamic Spectrum Access Platform and Infrastructure
CI-ADDO-NEW:合作研究:WiSER 动态频谱接入平台和基础设施
- 批准号:
1305405 - 财政年份:2013
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CI-ADDO-NEW: Collaborative Research: The Speech Recognition Virtual Kitchen
CI-ADDO-NEW:协作研究:语音识别虚拟厨房
- 批准号:
1305365 - 财政年份:2013
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CI-ADDO-NEW: Collaborative Research: WiSER Dynamic Spectrum Access Platform and Infrastructure
CI-ADDO-NEW:合作研究:WiSER 动态频谱接入平台和基础设施
- 批准号:
1305171 - 财政年份:2013
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CI-ADDO-NEW: PhantomNet: An End-to-End Mobile Network Testbed
CI-ADDO-NEW:PhantomNet:端到端移动网络测试平台
- 批准号:
1305384 - 财政年份:2013
- 资助金额:
$ 40万 - 项目类别:
Standard Grant














{{item.name}}会员




