NSF-BSF: Collaborative Research: RI: Small: Multilingual Language Generation via Understanding of Code Switching

NSF-BSF:协作研究:RI:小型:通过理解代码切换生成多语言

基本信息

  • 批准号:
    2203097
  • 负责人:
  • 金额:
    $ 34.56万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-10-01 至 2024-12-31
  • 项目状态:
    已结题

项目摘要

Human language technology has recently matured to the extent that computational systems can generally interact with users in ways that are natural to humans, not just to machines. However, most people in the world today are multilingual, and current approaches to language technology do not reflect the reality that multilingual communication is ubiquitous; that is, current technology can interact naturally with monolingual speakers, but not with multilingual ones. Computational systems should be able to generate language that sounds equally natural to these users, and this includes being able to accommodate nonnative speakers. This project first creates a large-scale, broad coverage dataset, reflecting conversations between humans and an automatic system that is sophisticated enough to generate fluent multilingual (i.e. 'code-switched') utterances, but is simple enough for controlled experiments. The dataset is far larger than ones that are currently available, and is based on a much more detailed understanding of language-switching strategies. Second, this dataset is used to develop new methods to incorporate code-switching into contemporary deep-learning language generation, including dialogue systems, question answering, assistive technologies, summarization and machine translation. This innovation should benefit a dramatic number of multilingual computer users, including less privileged users who are currently required to interact with machines in a language they do not speak fluently. Successful completion of the research program will pave the way for the development of natural language technologies that are more accommodating to such users, building bridges over the digital divide. The overarching goal of this project is to develop multilingual and contextualized language generation technologies that are more controllable and more adaptable to multilingual users. The project achieves this goal by completing the following objectives. (1) It develops psycholinguistically-grounded, scalable approaches to collecting corpora for studying how multilingual speakers adapt to each other's linguistic choices in text conversations. These methodologies are employed to collect large-scale, rich datasets of multilingual human-machine conversations. These datasets, as well as additional corpora of human code-switched interactions, should shed new light on the theoretical understanding of cross-lingual usage patterns, allowing for better understanding of how people employ code-switching in written language. (2) It uses the linguistic insights obtained through this endeavor to define classifiers that predict code-switching. (3) Novel approaches are developed for efficient, large-vocabulary neural language generation that incorporate these classifiers, allowing generation systems to introduce code-switching in a way that sounds natural to multilingual users. Consequently, this project should dramatically advance our understanding of code-switching, especially in the relatively unexplored territory of written dialogue. In addition, its contributions benefit a broad range of applications that rely on language generation, including dialogue systems, question answering, assistive technologies, summarization and machine translation.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
人类语言技术最近已经成熟到计算系统通常可以以人类而不仅仅是机器自然的方式与用户交互的程度。然而,当今世界上大多数人都是多语言的,而目前的语言技术方法并没有反映出多语言交流无处不在的现实;也就是说,目前的技术可以与单语者自然互动,但不能与多语言者互动。计算系统应该能够生成对这些用户来说听起来同样自然的语言,这包括能够适应非母语人士。该项目首先创建了一个大规模的,广泛覆盖的数据集,反映了人类和自动系统之间的对话,该系统足够复杂,可以生成流利的多语言(即“代码转换”)话语,但对于受控实验来说足够简单。该数据集比目前可用的数据集大得多,并且基于对语言转换策略的更详细的理解。其次,该数据集用于开发新方法,将代码转换纳入当代深度学习语言生成,包括对话系统,问答,辅助技术,摘要和机器翻译。这一创新将使大量多语言计算机用户受益,包括目前需要用他们不流利的语言与机器交互的特权较低的用户。研究计划的成功完成将为开发更适合这些用户的自然语言技术铺平道路,在数字鸿沟上架起桥梁。该项目的总体目标是开发多语言和上下文语言生成技术,这些技术更易于控制,更适合多语言用户。该项目通过完成以下目标来实现这一目标。(1)它开发了以心理语言学为基础的,可扩展的方法来收集语料库,以研究多语言使用者如何在文本对话中适应彼此的语言选择。这些方法被用来收集大规模的,丰富的多语言人机对话数据集。这些数据集,以及人类代码转换交互的其他语料库,应该为跨语言使用模式的理论理解提供新的启发,从而更好地理解人们如何在书面语言中使用代码转换。(2)它使用通过这种奋进获得的语言学见解来定义预测语码转换的分类器。(3)开发了新的方法,用于高效的,大词汇量的神经语言生成,其中包含这些分类器,允许生成系统以一种对多语言用户来说听起来很自然的方式引入代码切换。因此,这个项目应该大大提高我们对语码转换的理解,特别是在相对未开发的书面对话领域。此外,它的贡献还使依赖语言生成的广泛应用受益,包括对话系统、问答、辅助技术、摘要和机器翻译。该奖项反映了NSF的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(14)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
  • DOI:
    10.48550/arxiv.2210.07700
  • 发表时间:
    2022-10
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Sachin Kumar;Vidhisha Balachandran;Lucille Njoo;Antonios Anastasopoulos;Yulia Tsvetkov
  • 通讯作者:
    Sachin Kumar;Vidhisha Balachandran;Lucille Njoo;Antonios Anastasopoulos;Yulia Tsvetkov
LEXPLAIN: Improving Model Explanations via Lexicon Supervision
  • DOI:
    10.18653/v1/2023.starsem-1.19
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Orevaoghene Ahia;Hila Gonen;Vidhisha Balachandran;Yulia Tsvetkov;Noah A. Smith
  • 通讯作者:
    Orevaoghene Ahia;Hila Gonen;Vidhisha Balachandran;Yulia Tsvetkov;Noah A. Smith
Machine Translation into Low-resource Language Varieties
  • DOI:
    10.18653/v1/2021.acl-short.16
  • 发表时间:
    2021-06
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Sachin Kumar;Antonios Anastasopoulos;S. Wintner;Yulia Tsvetkov
  • 通讯作者:
    Sachin Kumar;Antonios Anastasopoulos;S. Wintner;Yulia Tsvetkov
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control
  • DOI:
    10.48550/arxiv.2210.17432
  • 发表时间:
    2022-10
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Xiaochuang Han;Sachin Kumar;Yulia Tsvetkov
  • 通讯作者:
    Xiaochuang Han;Sachin Kumar;Yulia Tsvetkov
KALM: Knowledge-Aware Integration of Local, Document, and Global Contexts for Long Document Understanding
  • DOI:
    10.48550/arxiv.2210.04105
  • 发表时间:
    2022-10
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Shangbin Feng;Zhaoxuan Tan;Wenqian Zhang;Zhenyu Lei;Yulia Tsvetkov
  • 通讯作者:
    Shangbin Feng;Zhaoxuan Tan;Wenqian Zhang;Zhenyu Lei;Yulia Tsvetkov
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Yulia Tsvetkov其他文献

Style Transfer Through Multilingual and Feedback-Based Back-Translation
通过多语言和基于反馈的回译进行风格迁移
  • DOI:
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Shrimai Prabhumoye;Yulia Tsvetkov;A. Black;R. Salakhutdinov
  • 通讯作者:
    R. Salakhutdinov
LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for Multi-Granular Propaganda Span Identification
LTIatCMU 在 SemEval-2020 任务 11:结合多级特征进行多粒度宣传跨度识别
  • DOI:
    10.18653/v1/2020.semeval-1.230
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Sopan Khosla;Rishabh Joshi;Ritam Dutt;A. Black;Yulia Tsvetkov
  • 通讯作者:
    Yulia Tsvetkov
RtGender: A Corpus for Studying Differential Responses to Gender
RtGender:研究性别差异反应的语料库
A Dynamic Strategy Coach for Effective Negotiation
有效谈判的动态策略教练
Extraction of Multi-word Expressions from Small Parallel Corpora By : Yulia Tsvetkov Supervised
  • DOI:
  • 发表时间:
    2010
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yulia Tsvetkov
  • 通讯作者:
    Yulia Tsvetkov

Yulia Tsvetkov的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Yulia Tsvetkov', 18)}}的其他基金

CAREER: Language Technologies Against the Language of Social Discrimination
职业:反对社会歧视语言的语言技术
  • 批准号:
    2142739
  • 财政年份:
    2022
  • 资助金额:
    $ 34.56万
  • 项目类别:
    Continuing Grant
Collaborative Research: RI: Small: NL(V)P: Natural Language (Variety) Processing
合作研究:RI:小型:NL(V)P:自然语言(品种)处理
  • 批准号:
    2125201
  • 财政年份:
    2021
  • 资助金额:
    $ 34.56万
  • 项目类别:
    Standard Grant
NSF-BSF: Collaborative Research: RI: Small: Multilingual Language Generation via Understanding of Code Switching
NSF-BSF:协作研究:RI:小型:通过理解代码切换生成多语言
  • 批准号:
    2007960
  • 财政年份:
    2020
  • 资助金额:
    $ 34.56万
  • 项目类别:
    Standard Grant
NSF-BSF: RI: Small: Collaborative Research: Modeling Crosslinguistic Influences Between Language Varieties
NSF-BSF:RI:小型:协作研究:模拟语言品种之间的跨语言影响
  • 批准号:
    1812327
  • 财政年份:
    2018
  • 资助金额:
    $ 34.56万
  • 项目类别:
    Continuing Grant

相似国自然基金

枯草芽孢杆菌BSF01降解高效氯氰菊酯的种内群体感应机制研究
  • 批准号:
    31871988
  • 批准年份:
    2018
  • 资助金额:
    59.0 万元
  • 项目类别:
    面上项目
基于掺硼直拉单晶硅片的Al-BSF和PERC太阳电池光衰及其抑制的基础研究
  • 批准号:
    61774171
  • 批准年份:
    2017
  • 资助金额:
    63.0 万元
  • 项目类别:
    面上项目
B细胞刺激因子-2(BSF-2)与自身免疫病的关系
  • 批准号:
    38870708
  • 批准年份:
    1988
  • 资助金额:
    3.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: NSF-BSF: Under Pressure: The evolution of guard cell turgor and the rise of the angiosperms
合作研究:NSF-BSF:压力之下:保卫细胞膨压的进化和被子植物的兴起
  • 批准号:
    2333889
  • 财政年份:
    2024
  • 资助金额:
    $ 34.56万
  • 项目类别:
    Standard Grant
Collaborative Research: NSF-BSF: Under Pressure: The evolution of guard cell turgor and the rise of the angiosperms
合作研究:NSF-BSF:压力之下:保卫细胞膨压的进化和被子植物的兴起
  • 批准号:
    2333888
  • 财政年份:
    2024
  • 资助金额:
    $ 34.56万
  • 项目类别:
    Continuing Grant
Collaborative Research: NSF-BSF: How cell adhesion molecules control neuronal circuit wiring: Binding affinities, binding availability and sub-cellular localization
合作研究:NSF-BSF:细胞粘附分子如何控制神经元电路布线:结合亲和力、结合可用性和亚细胞定位
  • 批准号:
    2321481
  • 财政年份:
    2024
  • 资助金额:
    $ 34.56万
  • 项目类别:
    Continuing Grant
Collaborative Research: NSF-BSF: How cell adhesion molecules control neuronal circuit wiring: Binding affinities, binding availability and sub-cellular localization
合作研究:NSF-BSF:细胞粘附分子如何控制神经元电路布线:结合亲和力、结合可用性和亚细胞定位
  • 批准号:
    2321480
  • 财政年份:
    2024
  • 资助金额:
    $ 34.56万
  • 项目类别:
    Continuing Grant
NSF-BSF: Collaborative Research: Solids and reactive transport processes in sewer systems of the future: modeling and experimental investigation
NSF-BSF:合作研究:未来下水道系统中的固体和反应性输送过程:建模和实验研究
  • 批准号:
    2134594
  • 财政年份:
    2024
  • 资助金额:
    $ 34.56万
  • 项目类别:
    Standard Grant
NSF-BSF: Collaborative Research: AF: Small: Algorithmic Performance through History Independence
NSF-BSF:协作研究:AF:小型:通过历史独立性实现算法性能
  • 批准号:
    2420942
  • 财政年份:
    2024
  • 资助金额:
    $ 34.56万
  • 项目类别:
    Standard Grant
Collaborative Research: NSF-BSF: SaTC: CORE: Small: Detecting malware with machine learning models efficiently and reliably
协作研究:NSF-BSF:SaTC:核心:小型:利用机器学习模型高效可靠地检测恶意软件
  • 批准号:
    2338301
  • 财政年份:
    2024
  • 资助金额:
    $ 34.56万
  • 项目类别:
    Continuing Grant
Collaborative Research: NSF-BSF: SaTC: CORE: Small: Detecting malware with machine learning models efficiently and reliably
协作研究:NSF-BSF:SaTC:核心:小型:利用机器学习模型高效可靠地检测恶意软件
  • 批准号:
    2338302
  • 财政年份:
    2024
  • 资助金额:
    $ 34.56万
  • 项目类别:
    Continuing Grant
Collaborative Research: NSF-BSF: Under Pressure: The evolution of guard cell turgor and the rise of the angiosperms
合作研究:NSF-BSF:压力之下:保卫细胞膨压的进化和被子植物的兴起
  • 批准号:
    2333890
  • 财政年份:
    2024
  • 资助金额:
    $ 34.56万
  • 项目类别:
    Standard Grant
NSF-BSF: Collaborative Research: Solids and reactive transport processes in sewer systems of the future: modeling and experimental investigation
NSF-BSF:合作研究:未来下水道系统中的固体和反应性输送过程:建模和实验研究
  • 批准号:
    2134747
  • 财政年份:
    2024
  • 资助金额:
    $ 34.56万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了