Collaborative Research: Language Documentation with an Artificial Intelligence (AI) Helper
协作研究:使用人工智能 (AI) 助手进行语言文档记录
基本信息
- 批准号:2109578
- 负责人:
- 金额:$ 23.97万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-09-01 至 2025-02-28
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Documentation of languages, especially endangered languages, is crucial for conserving humanity’s knowledge and cultural heritage, as well as for advancing an understanding of human language. Traditional documentation methods produce invaluable materials such as grammars, dictionaries, and annotated texts, but require more time than can be afforded to keep up with current language extinction rates. The most constructive response to this crisis is to complement documentation efforts by collecting data for as many languages as possible now and to make them accessible and interpretable so that they can be studied later by both linguists and members of the language communities. Digital technologies make it practical to obtain many hours of recordings in an endangered language along with translations. This project advances technologies for analyzing the recordings at the sub-word, word, and clause level so that they become accessible for a wide variety of documentary purposes.The project makes the information in digital recordings more interpretable for further linguistic analysis in three ways. First, the team is devising computational methods to automatically derive a basic phonological understanding and produce phonetic representations for languages, even if they do not have an established writing system. Second, the team is developing methods to automatically analyze the internal structure of words in languages where this structure is highly complex. Third, the team uses knowledge of more widely spoken languages to analyze related endangered languages. The resulting tool, the AI-helper toolbox, will be packaged with software that is currently widely in use by linguists and language communities in the language documentation process. All tools will be accessible through a web-based interface and the source code will be publicly available through GitHub.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
语言,特别是濒危语言的文献记录,对于保护人类的知识和文化遗产以及促进对人类语言的理解至关重要。传统的文档编制方法产生了语法、词典和注释文本等宝贵的材料,但需要更多的时间来跟上当前语言的灭绝速度。应对这一危机的最具建设性的办法是补充文献工作,现在就收集尽可能多的语言数据,并使这些数据易于获取和解释,以便语言学家和语言社区的成员以后可以研究这些数据。数字技术使得用濒危语言获得许多小时的录音沿着翻译变得可行。该项目推进了在子词、词和分句水平上分析录音的技术,以便它们可以用于各种各样的文献目的。该项目使数字录音中的信息更易于解释,以便在三个方面进行进一步的语言分析。首先,该团队正在设计计算方法,以自动获得基本的语音理解,并为语言产生语音表示,即使它们没有既定的书写系统。其次,该团队正在开发自动分析语言中单词内部结构的方法,这种结构非常复杂。第三,该团队利用更广泛使用的语言的知识来分析相关的濒危语言。由此产生的工具,即AI助手工具箱,将与目前语言学家和语言社区在语言文档过程中广泛使用的软件打包。所有工具都将通过基于Web的界面访问,源代码将通过GitHub公开提供。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(6)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Revisiting the Effects of Leakage on Dependency Parsing
重新审视泄漏对依存句法分析的影响
- DOI:
- 发表时间:2022
- 期刊:
- 影响因子:10.9
- 作者:Krasner, Nathaniel;Wanner, Miriam;Anastasopoulos, Antonios
- 通讯作者:Anastasopoulos, Antonios
Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages
Zambezi Voice:赞比亚语言的多语言语音语料库
- DOI:10.21437/interspeech.2023-1979
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Sikasote, Claytone;Siaminwe, Kalinda;Mwape, Stanly;Zulu, Bangiwe;Phiri, Mofya;Phiri, Martin;Zulu, David;Nyirenda, Mayumbo;Anastasopoulos, Antonios
- 通讯作者:Anastasopoulos, Antonios
Script Normalization for Unconventional Writing of Under-Resourced Languages in Bilingual Communities
双语社区资源贫乏语言的非常规写作的脚本规范化
- DOI:10.18653/v1/2023.acl-long.809
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Ahmadi, Sina;Anastasopoulos, Antonios
- 通讯作者:Anastasopoulos, Antonios
PALI: A Language Identification Benchmark for Perso-Arabic Scripts
PALI:波斯阿拉伯文字的语言识别基准
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Ahmadi, Sina;Agarwal, Milind;Anastasopoulos, Antonios
- 通讯作者:Anastasopoulos, Antonios
Approaches to Corpus Creation for Low-Resource Language Technology: the Case of Southern Kurdish and Laki
低资源语言技术的语料库创建方法:以南库尔德语和拉基语为例
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Ahmadi, Sina;Azin, Zahra;Belelli, Sara;Anastasopoulos, Antonios
- 通讯作者:Anastasopoulos, Antonios
{{
                item.title }}
{{ item.translation_title }}
- DOI:{{ item.doi }} 
- 发表时间:{{ item.publish_year }} 
- 期刊:
- 影响因子:{{ item.factor }}
- 作者:{{ item.authors }} 
- 通讯作者:{{ item.author }} 
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:{{ item.author }} 
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:{{ item.author }} 
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:{{ item.author }} 
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:{{ item.author }} 
数据更新时间:{{ patent.updateTime }}
Antonios Anastasopoulos其他文献
PROBER: A System for Real-time Propaganda Behavior Analytics on Social Media and Web Data Streams
PROBER:社交媒体和网络数据流实时宣传行为分析系统
- DOI:
- 发表时间:2022 
- 期刊:
- 影响因子:0
- 作者:Yasas Senarath;Antonios Anastasopoulos;Tonya Thornton;Hemant Purohit 
- 通讯作者:Hemant Purohit 
Noisy Parallel Data Alignment
嘈杂的并行数据对齐
- DOI:10.48550/arxiv.2301.09685 
- 发表时间:2023 
- 期刊:
- 影响因子:0
- 作者:Ruoyu Xie;Antonios Anastasopoulos 
- 通讯作者:Antonios Anastasopoulos 
Flagging Comprehensibility Issues in Hindi Text with Question Answering
通过问答标记印地语文本中的可理解性问题
- DOI:
- 发表时间:2021 
- 期刊:
- 影响因子:0
- 作者:Antonios Anastasopoulos;A. Cattelan;Yi Dou;Marcello Federico;Christian Federman;Dmitriy Genzel;Francisco Guzm'an;Junjie Hu;Sheila Castilho;Stephen Doherty;F. Gaspari;J. Devlin;Ming;Kenton Lee;Natasha Dhawan;I. Subbiah;Benjamin Thompson;Zachary Hildner;Areeba;Eric Prommer;Christian T Sinclair 
- 通讯作者:Christian T Sinclair 
To token or not to token: A Comparative Study of Text Representations for Cross-Lingual Transfer
标记还是不标记:跨语言迁移文本表示的比较研究
- DOI:10.48550/arxiv.2310.08078 
- 发表时间:2023 
- 期刊:
- 影响因子:0
- 作者:Md Mushfiqur Rahman;Fardin Ahsan Sakib;FAHIM FAISAL;Antonios Anastasopoulos 
- 通讯作者:Antonios Anastasopoulos 
Phylogeny-Inspired Adaptation of Multilingual Models to New Languages
受系统发育启发的多语言模型对新语言的适应
- DOI:
- 发表时间:2022 
- 期刊:
- 影响因子:0
- 作者:FAHIM FAISAL;Antonios Anastasopoulos 
- 通讯作者:Antonios Anastasopoulos 
Antonios Anastasopoulos的其他文献
{{
              item.title }}
{{ item.translation_title }}
- DOI:{{ item.doi }} 
- 发表时间:{{ item.publish_year }} 
- 期刊:
- 影响因子:{{ item.factor }}
- 作者:{{ item.authors }} 
- 通讯作者:{{ item.author }} 
{{ truncateString('Antonios Anastasopoulos', 18)}}的其他基金
EAGER: Building Language Technologies by Machine Reading Grammars
EAGER:通过机器阅读语法构建语言技术
- 批准号:2327143 
- 财政年份:2023
- 资助金额:$ 23.97万 
- 项目类别:Standard Grant 
CCRI: Planning-C: Facilitating Language Technologies for Crisis Response (LT4CR)
CCRI:Planning-C:促进语言技术应对危机(LT4CR)
- 批准号:2234895 
- 财政年份:2023
- 资助金额:$ 23.97万 
- 项目类别:Standard Grant 
Collaborative Research: RI: Small: NL(V)P: Natural Language (Variety) Processing
合作研究:RI:小型:NL(V)P:自然语言(品种)处理
- 批准号:2125466 
- 财政年份:2021
- 资助金额:$ 23.97万 
- 项目类别:Standard Grant 
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
Cell Research
- 批准号:31224802
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research
- 批准号:31024804
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research (细胞研究)
- 批准号:30824808
- 批准年份:2008
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
- 批准号:10774081
- 批准年份:2007
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: Conference: Large Language Models for Biological Discoveries (LLMs4Bio)
合作研究:会议:生物发现的大型语言模型 (LLMs4Bio)
- 批准号:2411529 
- 财政年份:2024
- 资助金额:$ 23.97万 
- 项目类别:Standard Grant 
Collaborative Research: Conference: Large Language Models for Biological Discoveries (LLMs4Bio)
合作研究:会议:生物发现的大型语言模型 (LLMs4Bio)
- 批准号:2411530 
- 财政年份:2024
- 资助金额:$ 23.97万 
- 项目类别:Standard Grant 
Collaborative Research: SHF: Medium: Toward Understandability and Interpretability for Neural Language Models of Source Code
合作研究:SHF:媒介:实现源代码神经语言模型的可理解性和可解释性
- 批准号:2423813 
- 财政年份:2024
- 资助金额:$ 23.97万 
- 项目类别:Standard Grant 
Collaborative Research: Inverse Task Planning from Few-Shot Vision Language Demonstrations
协作研究:基于少镜头视觉语言演示的逆向任务规划
- 批准号:2327974 
- 财政年份:2024
- 资助金额:$ 23.97万 
- 项目类别:Standard Grant 
Collaborative Research: Inverse Task Planning from Few-Shot Vision Language Demonstrations
协作研究:基于少镜头视觉语言演示的逆向任务规划
- 批准号:2327973 
- 财政年份:2024
- 资助金额:$ 23.97万 
- 项目类别:Standard Grant 
Collaborative Research: SHF: Medium: Toward Understandability and Interpretability for Neural Language Models of Source Code
合作研究:SHF:媒介:实现源代码神经语言模型的可理解性和可解释性
- 批准号:2311468 
- 财政年份:2023
- 资助金额:$ 23.97万 
- 项目类别:Standard Grant 
Collaborative Research: A longitudinal approach to examining perception-production links in second language speech sound learning.
协作研究:检查第二语言语音学习中感知-产生联系的纵向方法。
- 批准号:2309561 
- 财政年份:2023
- 资助金额:$ 23.97万 
- 项目类别:Standard Grant 
Collaborative Research: Education DCL: EAGER: Harnessing the Power of Large Language Models in Digital Forensics Education at MSI and HBCU
合作研究:教育 DCL:EAGER:在 MSI 和 HBCU 的数字取证教育中利用大型语言模型的力量
- 批准号:2333951 
- 财政年份:2023
- 资助金额:$ 23.97万 
- 项目类别:Standard Grant 
Collaborative Research: EAGER: Developing and Optimizing Reflection-Informed STEM Learning and Instruction by Integrating Learning Technologies with Natural Language Processing
合作研究:EAGER:通过将学习技术与自然语言处理相结合来开发和优化基于反思的 STEM 学习和教学
- 批准号:2329273 
- 财政年份:2023
- 资助金额:$ 23.97万 
- 项目类别:Standard Grant 
Collaborative Research: Quantifying sign reduction in sign language using human pose estimation
合作研究:使用人体姿势估计量化手语中的符号减少
- 批准号:2234787 
- 财政年份:2023
- 资助金额:$ 23.97万 
- 项目类别:Standard Grant 

 刷新
              刷新
            
















 {{item.name}}会员
              {{item.name}}会员
            



