CAREER: Modeling Language Evolution via Deep Probabilistic Factorization

职业:通过深度概率分解建模语言演化

基本信息

项目摘要

The broad diversity of contemporary and historical languages, dialects, and writing systems presents a daunting challenge for artificial intelligence (AI) systems that must process language data (e.g. systems that automatically recognize handwriting or attempt to translate from one language to another). However, within this great diversity, there are strong patterns of regularity. Historical linguists have shown that many aspects of language evolve over time in accordance with regular patterns of change, including spoken language, spelling, and even the visual appearance of symbols. This project aims to develop novel AI frameworks that can better understand language diversity by automatically analyzing large and diverse datasets consisting of many languages, dialects, and writing systems. The project will result in a collection of new AI systems that track how visual and textual aspects of language evolve over time in order to (1) provide better understanding of how languages change and develop and (2) make downstream AI systems more robust to language diversity. Finally, this research will also support interdisciplinary training of a diverse set of graduate students at the University of California San Diego, as well as the development of interdisciplinary educational modules for high school students interested in AI. This CAREER project will develop a novel computational framework that combines methods from matrix and tensor factorization with deep generative modeling techniques to support analysis of language evolution over a broad range of languages, dialects, and writing systems. The project will create a learning paradigm that (1) incorporates prior phylogenetic knowledge of language history as structured priors, (2) supports efficient approximate inference of historical language forms using neural decoders, (3) is easily portable to a variety of linguistic domains and levels of language representation, and (4) directly analyzes primary data (e.g. images of signs) rather than manually-curated feature lists. Further, the framework will generalize across both visual and textual modalities, allowing for study of the multi-modal nature language evolution -- e.g. scripts evolve through visual change, cognates through phonetic or orthographic change -- and potentially laying the groundwork for future work investigating how script and dialect co-evolve or cultural evolution studies of spoken audio. Finally, the outcomes of each of several applied studies may lead to new evidence for specific historical and paleographic hypotheses.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
当代和历史语言、方言和书写系统的广泛多样性对必须处理语言数据的人工智能(AI)系统(例如,自动识别手写或尝试从一种语言翻译成另一种语言的系统)提出了令人生畏的挑战。然而,在这种巨大的多样性中,有很强的规律性模式。历史语言学家已经证明,语言的许多方面都是随着时间的推移而演变的,包括口语、拼写,甚至符号的视觉外观。该项目旨在开发新的人工智能框架,通过自动分析由多种语言、方言和书写系统组成的大型且多样化的数据集,从而更好地理解语言多样性。该项目将产生一系列新的人工智能系统,跟踪语言的视觉和文本方面如何随着时间的推移而演变,以(1)更好地了解语言的变化和发展,以及(2)使下游的人工智能系统对语言多样性更加健壮。最后,这项研究还将支持加州大学圣地亚哥分校对不同学科的研究生进行跨学科培训,以及为对人工智能感兴趣的高中生开发跨学科教育模块。这个职业项目将开发一个新的计算框架,将矩阵和张量因式分解的方法与深度生成建模技术相结合,以支持对广泛的语言、方言和书写系统的语言演变的分析。该项目将创建一种学习范式,即(1)将先前的语言历史系统学知识作为结构化的先验知识,(2)支持使用神经解码器对历史语言形式进行有效的近似推理,(3)可以很容易地移植到各种语言领域和语言表达水平,以及(4)直接分析原始数据(例如,标志图像),而不是人工编制的特征列表。此外,该框架将推广到视觉和文本两种形式,从而能够研究多模式的自然语言进化--例如,文字通过视觉变化进化,同源词通过语音或拼写变化进化--并可能为未来研究文字和方言如何共同进化或口语的文化进化研究奠定基础。最后,几项应用研究的每一项结果都可能为特定的历史和古地理假说带来新的证据。这一裁决反映了NSF的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Taylor Berg-Kirkpatrick其他文献

Taylor Berg-Kirkpatrick的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Taylor Berg-Kirkpatrick', 18)}}的其他基金

Collaborative Research: RI: Small: Unsupervised Islamicate Manuscript Transcription via Lacunae Reconstruction
合作研究:RI:小型:通过缺口重建进行无监督伊斯兰手稿转录
  • 批准号:
    2200333
  • 财政年份:
    2022
  • 资助金额:
    $ 60万
  • 项目类别:
    Standard Grant
RI: Small: Print and Probability - A Statistical Approach to Analysis of Clandestine Publication
RI:小:印刷品和概率 - 秘密出版物分析的统计方法
  • 批准号:
    1936155
  • 财政年份:
    2019
  • 资助金额:
    $ 60万
  • 项目类别:
    Standard Grant
RI: Small: Print and Probability - A Statistical Approach to Analysis of Clandestine Publication
RI:小:印刷品和概率 - 秘密出版物分析的统计方法
  • 批准号:
    1816311
  • 财政年份:
    2018
  • 资助金额:
    $ 60万
  • 项目类别:
    Standard Grant
RI: Small: Collaborative Research: Unsupervised Transcription of Early Modern Documents
RI:小型:合作研究:早期现代文献的无监督转录
  • 批准号:
    1618044
  • 财政年份:
    2016
  • 资助金额:
    $ 60万
  • 项目类别:
    Standard Grant

相似国自然基金

Galaxy Analytical Modeling Evolution (GAME) and cosmological hydrodynamic simulations.
  • 批准号:
  • 批准年份:
    2025
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目

相似海外基金

Comprehension and application of the 3D environments through language modeling grounded to the real world
通过基于现实世界的语言建模来理解和应用 3D 环境
  • 批准号:
    22KK0184
  • 财政年份:
    2023
  • 资助金额:
    $ 60万
  • 项目类别:
    Fund for the Promotion of Joint International Research (Fostering Joint International Research (A))
Study of Human Statistical Biases on Unsupervised Parsing and Language Modeling
无监督句法分析和语言建模的人类统计偏差研究
  • 批准号:
    23KJ0565
  • 财政年份:
    2023
  • 资助金额:
    $ 60万
  • 项目类别:
    Grant-in-Aid for JSPS Fellows
Physical modeling, excised larynx, mathematical modeling of voice instability induced by vocal membranes and their implications to the evolution of language
物理模型、切除的喉部、声膜引起的声音不稳定的数学模型及其对语言进化的影响
  • 批准号:
    23H03424
  • 财政年份:
    2023
  • 资助金额:
    $ 60万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
CAREER: Modeling Spoken Language Without Parallel Text Annotations
职业:在没有并行文本注释的情况下对口语进行建模
  • 批准号:
    2238605
  • 财政年份:
    2023
  • 资助金额:
    $ 60万
  • 项目类别:
    Continuing Grant
Computational modeling of language impairment and control in bilingual individuals with post-stroke aphasia and neurodegenerative disorders
中风后失语症和神经退行性疾病双语个体语言障碍和控制的计算模型
  • 批准号:
    10680656
  • 财政年份:
    2023
  • 资助金额:
    $ 60万
  • 项目类别:
Linguistic signifying in spontaneous oral interactions and its pedagogical modeling in French as a Foreign Language
法语作为外语的自发口语互动中的语言意指及其教学模式
  • 批准号:
    22K13162
  • 财政年份:
    2022
  • 资助金额:
    $ 60万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Adapting Hierarchical Circuits from Planning to Language with Computational Modeling
通过计算建模调整从规划到语言的分层电路
  • 批准号:
    10709065
  • 财政年份:
    2022
  • 资助金额:
    $ 60万
  • 项目类别:
Modeling Temporality with Natural Language Processing to Predict Readmission Risk of Patients with Psychosis
使用自然语言处理对时序进行建模以预测精神病患者的再入院风险
  • 批准号:
    10445583
  • 财政年份:
    2022
  • 资助金额:
    $ 60万
  • 项目类别:
SAI-P: Overcoming Barriers to User-Centered Infrastructure Planning with System Modeling and Natural Language Processing
SAI-P:通过系统建模和自然语言处理克服以用户为中心的基础设施规划的障碍
  • 批准号:
    2228783
  • 财政年份:
    2022
  • 资助金额:
    $ 60万
  • 项目类别:
    Standard Grant
Modeling Temporality with Natural Language Processing to Predict Readmission Risk of Patients with Psychosis
使用自然语言处理对时序进行建模以预测精神病患者的再入院风险
  • 批准号:
    10669207
  • 财政年份:
    2022
  • 资助金额:
    $ 60万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了