Natural Language Processing at the Sub-Word Level
子词级别的自然语言处理
基本信息
- 批准号:261284-2012
- 负责人:
- 金额:$ 1.24万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2015
- 资助国家:加拿大
- 起止时间:2015-01-01 至 2016-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Processing language involves processing words. However, words are not indivisible abstract atoms -
they are made of smaller units, such as morphemes, syllables, letters, and phonemes. In computational
linguistics, words are analyzed on distinct levels: phonetic, phonological, orthographic,
morphological, etc. These levels of representation underlie such important tasks as morphological
parsing, speech synthesis and recognition, spell-checking, and stemming. Since different levels are
strongly inter-related, the understanding of the interactions between them is crucial to advancing the
state of the art in word-oriented applications. My objective is to investigate such interactions, and
develop algorithms for alignment and conversion between levels. In particular, I will focus on the
tasks of grapheme-to-phoneme conversion and transliteration.
My general approach to the problem will be to incorporate linguistic knowledge into advanced
machine-learning techniques, which, given sufficient training data, substantially outperform
rule-based approaches. However, the latter often achieve impressive accuracy without any need for
training data. They also tend to perform well across different domains. I will explore the ways of
combining the two paradigms, by exploiting linguistic understanding for guiding and informing machine
learning approaches, as well as for pre- and post-processing their training data.
The long term goals of this research programme are acquiring deeper understanding of the dependencies
between various representations, and leveraging the novel insights in order to advance the state of
the art in natural language processing. I am confident that the implementation of the planned projects
will result in successful applications in other areas of natural language processing.
处理语言涉及处理单词。然而,文字并不是不可分割的抽象原子——
它们由较小的单位组成,例如语素、音节、字母和音素。在计算方面
在语言学中,单词在不同的层面上进行分析:语音、音系、拼写、
这些表示级别是形态学等重要任务的基础
语法分析、语音合成和识别、拼写检查和词干提取。由于级别不同
相互关联性很强,了解它们之间的相互作用对于推进
面向单词的应用程序的最新技术。我的目标是调查这种相互作用,并且
开发级别之间的对齐和转换算法。我将特别关注
字素到音素的转换和音译任务。
我解决这个问题的一般方法是将语言知识融入高级
机器学习技术,如果有足够的训练数据,其性能将大大优于
基于规则的方法。然而,后者通常无需任何操作即可达到令人印象深刻的准确性
训练数据。他们也往往在不同领域表现良好。我将探索以下方法
结合两种范式,利用语言理解来指导和通知机器
学习方法,以及训练数据的预处理和后处理。
该研究计划的长期目标是更深入地了解依赖性
各种表征之间的关系,并利用新颖的见解来推进状态
自然语言处理的艺术。我对计划项目的实施充满信心
将在自然语言处理的其他领域取得成功的应用。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Kondrak, Grzegorz其他文献
Kondrak, Grzegorz的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Kondrak, Grzegorz', 18)}}的其他基金
Overcoming Data Sparsity in Machine Translation
克服机器翻译中的数据稀疏性
- 批准号:
RGPIN-2017-05875 - 财政年份:2021
- 资助金额:
$ 1.24万 - 项目类别:
Discovery Grants Program - Individual
Overcoming Data Sparsity in Machine Translation
克服机器翻译中的数据稀疏性
- 批准号:
RGPIN-2017-05875 - 财政年份:2020
- 资助金额:
$ 1.24万 - 项目类别:
Discovery Grants Program - Individual
Overcoming Data Sparsity in Machine Translation
克服机器翻译中的数据稀疏性
- 批准号:
RGPIN-2017-05875 - 财政年份:2019
- 资助金额:
$ 1.24万 - 项目类别:
Discovery Grants Program - Individual
Overcoming Data Sparsity in Machine Translation
克服机器翻译中的数据稀疏性
- 批准号:
RGPIN-2017-05875 - 财政年份:2018
- 资助金额:
$ 1.24万 - 项目类别:
Discovery Grants Program - Individual
Overcoming Data Sparsity in Machine Translation
克服机器翻译中的数据稀疏性
- 批准号:
RGPIN-2017-05875 - 财政年份:2017
- 资助金额:
$ 1.24万 - 项目类别:
Discovery Grants Program - Individual
Natural Language Processing at the Sub-Word Level
子词级别的自然语言处理
- 批准号:
261284-2012 - 财政年份:2016
- 资助金额:
$ 1.24万 - 项目类别:
Discovery Grants Program - Individual
Natural Language Processing at the Sub-Word Level
子词级别的自然语言处理
- 批准号:
261284-2012 - 财政年份:2014
- 资助金额:
$ 1.24万 - 项目类别:
Discovery Grants Program - Individual
Natural Language Processing at the Sub-Word Level
子词级别的自然语言处理
- 批准号:
261284-2012 - 财政年份:2013
- 资助金额:
$ 1.24万 - 项目类别:
Discovery Grants Program - Individual
Natural Language Processing at the Sub-Word Level
子词级别的自然语言处理
- 批准号:
261284-2012 - 财政年份:2012
- 资助金额:
$ 1.24万 - 项目类别:
Discovery Grants Program - Individual
Word form similarity computation and application in natural language processing
词形相似度计算及其在自然语言处理中的应用
- 批准号:
261284-2007 - 财政年份:2011
- 资助金额:
$ 1.24万 - 项目类别:
Discovery Grants Program - Individual
相似海外基金
REU Site: Recent Advances in Natural Language Processing
REU 网站:自然语言处理的最新进展
- 批准号:
2349452 - 财政年份:2024
- 资助金额:
$ 1.24万 - 项目类别:
Standard Grant
Navigating Chemical Space with Natural Language Processing and Deep Learning
利用自然语言处理和深度学习驾驭化学空间
- 批准号:
EP/Y004167/1 - 财政年份:2024
- 资助金额:
$ 1.24万 - 项目类别:
Research Grant
Collaborative Research: EAGER: Developing and Optimizing Reflection-Informed STEM Learning and Instruction by Integrating Learning Technologies with Natural Language Processing
合作研究:EAGER:通过将学习技术与自然语言处理相结合来开发和优化基于反思的 STEM 学习和教学
- 批准号:
2329273 - 财政年份:2023
- 资助金额:
$ 1.24万 - 项目类别:
Standard Grant
SBIR Phase I: Sown To Grow - Measuring Growth in Trusting Relationships between Students and Educators with Natural Language Processing and Machine Learning Technologies
SBIR 第一阶段:播种成长 - 使用自然语言处理和机器学习技术衡量学生和教育工作者之间信任关系的增长
- 批准号:
2322340 - 财政年份:2023
- 资助金额:
$ 1.24万 - 项目类别:
Standard Grant
Studies of speech, image and natural language processing for multimodal spoken document retrieval
多模态语音文档检索的语音、图像和自然语言处理研究
- 批准号:
23K11216 - 财政年份:2023
- 资助金额:
$ 1.24万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Efficient and Fair Language Modelling for Natural Language Processing, investigating lightweight language modelling approaches and aiming at fairness
自然语言处理的高效公平语言建模,研究轻量级语言建模方法并以公平为目标
- 批准号:
2894795 - 财政年份:2023
- 资助金额:
$ 1.24万 - 项目类别:
Studentship
Harmony AI: Natural Language Processing Enabling Advanced Biomanufacturing
Harmony AI:自然语言处理实现先进生物制造
- 批准号:
10761082 - 财政年份:2023
- 资助金额:
$ 1.24万 - 项目类别:
Applying Natural Language Processing to real-world patient data to optimise cancer care
将自然语言处理应用于现实世界的患者数据以优化癌症护理
- 批准号:
2897525 - 财政年份:2023
- 资助金额:
$ 1.24万 - 项目类别:
Studentship
CAREER: Data-driven design of graphene oxide for environmental applications enabled by natural language processing and machine learning techniques
职业:通过自然语言处理和机器学习技术实现氧化石墨烯环境应用的数据驱动设计
- 批准号:
2238415 - 财政年份:2023
- 资助金额:
$ 1.24万 - 项目类别:
Continuing Grant
Collaborative Research: EAGER: Developing and Optimizing Reflection-Informed STEM Learning and Instruction by Integrating Learning Technologies with Natural Language Processing
合作研究:EAGER:通过将学习技术与自然语言处理相结合来开发和优化基于反思的 STEM 学习和教学
- 批准号:
2329274 - 财政年份:2023
- 资助金额:
$ 1.24万 - 项目类别:
Standard Grant