CI-ADDO-NEW: Collaborative Research: A Repository for Annotating Multilingual Code Switched Data

CI-ADDO-NEW:协作研究:用于注释多语言代码交换数据的存储库

基本信息

  • 批准号:
    1205556
  • 负责人:
  • 金额:
    $ 40万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2012
  • 资助国家:
    美国
  • 起止时间:
    2012-09-01 至 2013-06-30
  • 项目状态:
    已结题

项目摘要

Linguistic code switching (LCS) is the practice of switching back and forth between the shared languages of bilingual or multilingual speakers. This phenomenon is particularly prevalent in geographic regions with linguistic boundaries or where there are large immigrant groups. Various levels of language (phonological, morphological, syntactic, semantic and discourse-pragmatic) may be implicated in LCS in different language pairs and/or genres. Computational algorithms trained for a single language quickly break down when the input includes LCS. A major barrier to research on LCS in computational linguistics (CL) has been the lack of large, accurately annotated corpora of LCS data. In this project, a large repository of LCS data is collected and a large annotation infrastructure is developed. It is consistently annotated in different modalities (speech and text), at various levels of linguistic granularity, and across different language pairs reflecting different linguistic typologies (Standard Arabic and Dialectal Arabic, Arabic-English, Spanish-English, Chinese-English, Hindi-English). The focus of the effort is on intra-sentential LCS.This infrastructure and unified large LCS data resource is eagerly awaited by the CL research community, since annotated LCS data provides a natural test-bed for adaptive learning algorithms and the handling of diverse data sources, as well as a framework for genuine multilingual processing. It will also be of benefit to sociolinguistic and theoretical linguistic researchers, and provide a platform for collaborative interdisciplinary research. Finally, research on LCS helps overcome biases against multilingual speakers by demonstrating the creativity of such speakers in exploiting their verbal repertoires. Such a result is particularly important for K-12 education and testing policies in the USA with its diverse immigrant population.
语言语码转换(LCS)是指双语或多语使用者在使用的共同语言之间来回切换的行为。这种现象在有语言界限的地理区域或有大量移民群体的地方尤其普遍。在不同的语言对和/或体裁中,LCS可能涉及不同层次的语言(语音、形态、句法、语义和语篇语用)。当输入包含LCS时,为单一语言训练的计算算法很快就会崩溃。计算语言学(CL)中LCS研究的一个主要障碍是缺乏大型、准确注释的LCS数据语料库。在这个项目中,收集了一个大型的LCS数据存储库,并开发了一个大型的注释基础设施。它始终以不同的形式(语音和文本)、不同的语言粒度级别以及反映不同语言类型的不同语言对(标准阿拉伯语和方言阿拉伯语、阿拉伯语-英语、西班牙语-英语、汉语-英语、印度语-英语)进行注释。这项工作的重点是句子内的LCS。这种基础设施和统一的大型LCS数据资源是CL研究社区热切期待的,因为带注释的LCS数据为自适应学习算法和处理各种数据源提供了一个自然的测试平台,以及一个真正的多语言处理框架。它也将有利于社会语言学和理论语言学的研究人员,并为跨学科的合作研究提供一个平台。最后,对LCS的研究有助于克服对多语使用者的偏见,展示了多语使用者在利用他们的语言库方面的创造力。这样的结果对于拥有多样化移民人口的美国的K-12教育和考试政策尤其重要。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Mona Diab其他文献

Improving Coherence of Language Model Generation with Latent Semantic State
提高语言模型生成与潜在语义状态的一致性
  • DOI:
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Amanda Askell;Yuntao Bai;Anna Chen;Dawn Drain;Deep Ganguli;T. Henighan;Andy Jones;Benjamin Mann;Nova Dassarma;Nelson El;Zac Hatfield;Danny Hernandez;John Kernion;Kamal Ndousse;Catherine Olsson;Dario Amodei;Tom Brown;J. Clark;Sam Mc;Chris Olah;Jared Kaplan;Nick Ryder;Jared D Subbiah;Prafulla Kaplan;A. Dhariwal;P. Neelakantan;Girish Shyam;Amanda Sastry;Sandhini Askell;Ariel Agarwal;Herbert;Gretchen Krueger;R. Child;Aditya Ramesh;Daniel M. Ziegler;Jeffrey Wu;Christopher Winter;Mark Hesse;Eric Chen;Mateusz Sigler;Scott teusz Litwin;Benjamin Gray;Jack Chess;Christopher Clark;Sam Berner;Alec McCandlish;Ilya Radford;Sutskever Dario;Amodei;Joshua Maynez;Shashi Narayan;Bernd Bohnet;Kurt Shuster;Spencer Poff;Moya Chen;Douwe Kiela;Shane Storks;Qiaozi Gao;Yichi Zhang;Joyce Chai;Niket Tandon;Keisuke Sakaguchi;Bhavana Dalvi;Dheeraj Rajagopal;Peter Clark;Michal Guerquin;Kyle Richardson;Eduard H. Hovy;A. Dataset;Rowan Zellers;Ari Holtzman;Matthew E. Peters;Roozbeh Mottaghi;Aniruddha Kembhavi;Ali Farhadi;Chunting Zhou;Graham Neubig;Jiatao Gu;Mona Diab;Francisco Guzmán;Luke Zettlemoyer
  • 通讯作者:
    Luke Zettlemoyer
Investigating Cultural Alignment of Large Language Models
研究大型语言模型的文化一致性
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Badr AlKhamissi;Muhammad N. ElNokrashy;Mai AlKhamissi;Mona Diab
  • 通讯作者:
    Mona Diab
Arabic natural language processing for Qur’anic research: a systematic review
  • DOI:
    10.1007/s10462-022-10313-2
  • 发表时间:
    2022-12-02
  • 期刊:
  • 影响因子:
    13.900
  • 作者:
    Muhammad Huzaifa Bashir;Aqil M. Azmi;Haq Nawaz;Wajdi Zaghouani;Mona Diab;Ala Al-Fuqaha;Junaid Qadir
  • 通讯作者:
    Junaid Qadir
Combining Discrete Wavelet and Cosine Transforms for Efficient Sentence Embedding
结合离散小波和余弦变换实现高效句子嵌入
Author Correction: Arabic natural language processing for Qur’anic research: a systematic review
  • DOI:
    10.1007/s10462-023-10390-x
  • 发表时间:
    2023-03-24
  • 期刊:
  • 影响因子:
    13.900
  • 作者:
    Muhammad Huzaifa Bashir;Aqil M. Azmi;Haq Nawaz;Wajdi Zaghouani;Mona Diab;Ala Al-Fuqaha;Junaid Qadir
  • 通讯作者:
    Junaid Qadir

Mona Diab的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Mona Diab', 18)}}的其他基金

CI-P: Towards the Creation of a Unified Repository for MultiLingual and CrossLingual Multiword Expressions
CI-P:为多语言和跨语言多词表达式创建统一存储库
  • 批准号:
    1513116
  • 财政年份:
    2015
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
CI-ADDO-NEW: Collaborative Research: A Repository for Annotating Multilingual Code Switched Data
CI-ADDO-NEW:协作研究:用于注释多语言代码交换数据的存储库
  • 批准号:
    1343530
  • 财政年份:
    2013
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
Collaborative Research: CI-P: Creation of an annotated repository of multilingual and multigenre code switched data for several language pairs
合作研究:CI-P:创建多个语言对的多语言和多流派代码交换数据的带注释存储库
  • 批准号:
    0958440
  • 财政年份:
    2010
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
SGER: Automatic Processing of Natural Language Code Switching
SGER:自然语言代码切换的自动处理
  • 批准号:
    0749062
  • 财政年份:
    2007
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant

相似海外基金

CI-ADDO-NEW: Collaborative Research: Development of DARwIn Humanoid Robots for Research, Education and Outreach
CI-ADDO-NEW:协作研究:开发用于研究、教育和推广的 DARwIn 人形机器人
  • 批准号:
    1564417
  • 财政年份:
    2015
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CI-ADDO-NEW: Collaborative Research: A Repository for Annotating Multilingual Code Switched Data
CI-ADDO-NEW:协作研究:用于注释多语言代码交换数据的存储库
  • 批准号:
    1462142
  • 财政年份:
    2014
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
CI-ADDO-NEW: Collaborative Research: A Repository for Annotating Multilingual Code Switched Data
CI-ADDO-NEW:协作研究:用于注释多语言代码交换数据的存储库
  • 批准号:
    1343530
  • 财政年份:
    2013
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
CI-ADDO-NEW: Collaborative Research: The Speech Recognition Virtual Kitchen
CI-ADDO-NEW:协作研究:语音识别虚拟厨房
  • 批准号:
    1305215
  • 财政年份:
    2013
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
CI-ADDO-NEW: ASTERIX: A Community Software Platform for Big Data Research, Analysis, and Management
CI-ADDO-NEW:ASTERIX:用于大数据研究、分析和管理的社区软件平台
  • 批准号:
    1305253
  • 财政年份:
    2013
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
CI-ADDO-NEW: Collaborative Research: The Speech Recognition Virtual Kitchen
CI-ADDO-NEW:协作研究:语音识别虚拟厨房
  • 批准号:
    1305319
  • 财政年份:
    2013
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
CI-ADDO-NEW: OCCAM: Open Curation for Computer Architecture Modeling
CI-ADDO-NEW:OCCAM:计算机架构建模的开放式管理
  • 批准号:
    1305220
  • 财政年份:
    2013
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
CI-ADDO-NEW: Collaborative Research: WiSER Dynamic Spectrum Access Platform and Infrastructure
CI-ADDO-NEW:合作研究:WiSER 动态频谱接入平台和基础设施
  • 批准号:
    1305405
  • 财政年份:
    2013
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
CI-ADDO-NEW: Collaborative Research: The Speech Recognition Virtual Kitchen
CI-ADDO-NEW:协作研究:语音识别虚拟厨房
  • 批准号:
    1305365
  • 财政年份:
    2013
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
CI-ADDO-NEW: Collaborative Research: WiSER Dynamic Spectrum Access Platform and Infrastructure
CI-ADDO-NEW:合作研究:WiSER 动态频谱接入平台和基础设施
  • 批准号:
    1305171
  • 财政年份:
    2013
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了