Overcoming Data Sparsity in Machine Translation
克服机器翻译中的数据稀疏性
基本信息
- 批准号:RGPIN-2017-05875
- 负责人:
- 金额:$ 1.68万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2019
- 资助国家:加拿大
- 起止时间:2019-01-01 至 2020-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Canada is a multicultural society. A large percentage of Canadian residents report a mother tongue that is distinct from either English or French. In addition, Canada is home to a rich variety of indigenous languages, some of which have also been granted official status. Everyone has the right to get all official federal government services, publications and documents in both English and French. Important information for new Canadians is often provided in multiple languages and scripts. Increasing the availability of texts in aboriginal languages increases their prestige, and thus helps preserve them.******As a consequence, there exists an acute need for accurate and rapid translations, not only between English and French, but also into other languages. Human translation is slow and expensive, and requires highly-skilled experts. Computer translation programs, known as machine translation, have the potential to fill the gap. Unfortunately, the current technology is far from perfect. The quality of translations involving smaller languages is often poor, and even between major languages, it is sometimes inadequate for technical applications.******Two of the reasons for the low quality of machine translation are the scarcity of bilingual texts for low-resourced languages, and the prevalence of infrequent words, such as certain verb inflections in French. The dominant statistical machine translation approach, which is used in web programs such as Google Translate, struggles to properly translate words that occur only rarely in bilingual texts.******The objective of this proposal is to improve the quality of machine translation by improving the handling of infrequent words. The principal research directions are the incorporation of the state-of-the-art morphological techniques into the translation process, the development of lexicon induction methods, and the translation of out-of-vocabulary words based on the cutting-edge algorithms for cognate identification, name transliteration, and decipherment.******In the current global economy, the enormous demand for fast and freely-available translations can only be satisfied by the machine translation programs. The solutions that I outline in my proposal will not only improve the quality of machine translation, but also influence the research on other aspects of natural language processing, thus accelerating the progress towards the goal of making computers understand human language.
加拿大是一个多元文化的社会。很大比例的加拿大居民报告说,他们的母语不同于英语或法语。此外,加拿大是各种土著语言的家园,其中一些语言也被授予官方地位。每个人都有权获得所有正式的联邦政府服务、出版物和文件的英文和法文版本。为新加拿大人提供的重要信息通常以多种语言和文字提供。增加土著语言文本的可用性可以提高他们的声望,从而有助于保护他们。因此,迫切需要准确和迅速的翻译,不仅是英文和法文之间的翻译,而且也需要翻译成其他语文。人工翻译速度慢,成本高,需要高技能的专家。被称为机器翻译的计算机翻译程序有可能填补这一差距。不幸的是,目前的技术还远远不够完美。涉及较小语种的翻译质量往往很差,即使是主要语种之间的翻译,有时也不足以满足技术应用的需要。机器翻译质量低的两个原因是缺乏资源匮乏的语言的双语文本,以及不常见的单词的流行,例如法语中的某些动词变化。在Google翻译等网络程序中使用的占主导地位的统计机器翻译方法,很难正确翻译双语文本中很少出现的单词。该提案的目的是通过改进对不常用词的处理来提高机器翻译的质量。主要研究方向是将最先进的形态学技术融入翻译过程,词汇归纳方法的发展,以及基于同源识别,名称音译和解密的尖端算法的词汇外词的翻译。在当前的全球经济中,对快速和免费翻译的巨大需求只能通过机器翻译程序来满足。我在提案中概述的解决方案不仅会提高机器翻译的质量,而且会影响自然语言处理其他方面的研究,从而加速实现计算机理解人类语言的目标。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Kondrak, Grzegorz其他文献
Kondrak, Grzegorz的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Kondrak, Grzegorz', 18)}}的其他基金
Overcoming Data Sparsity in Machine Translation
克服机器翻译中的数据稀疏性
- 批准号:
RGPIN-2017-05875 - 财政年份:2021
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Overcoming Data Sparsity in Machine Translation
克服机器翻译中的数据稀疏性
- 批准号:
RGPIN-2017-05875 - 财政年份:2020
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Overcoming Data Sparsity in Machine Translation
克服机器翻译中的数据稀疏性
- 批准号:
RGPIN-2017-05875 - 财政年份:2018
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Overcoming Data Sparsity in Machine Translation
克服机器翻译中的数据稀疏性
- 批准号:
RGPIN-2017-05875 - 财政年份:2017
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Natural Language Processing at the Sub-Word Level
子词级别的自然语言处理
- 批准号:
261284-2012 - 财政年份:2016
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Natural Language Processing at the Sub-Word Level
子词级别的自然语言处理
- 批准号:
261284-2012 - 财政年份:2015
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Natural Language Processing at the Sub-Word Level
子词级别的自然语言处理
- 批准号:
261284-2012 - 财政年份:2014
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Natural Language Processing at the Sub-Word Level
子词级别的自然语言处理
- 批准号:
261284-2012 - 财政年份:2013
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Natural Language Processing at the Sub-Word Level
子词级别的自然语言处理
- 批准号:
261284-2012 - 财政年份:2012
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Word form similarity computation and application in natural language processing
词形相似度计算及其在自然语言处理中的应用
- 批准号:
261284-2007 - 财政年份:2011
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
相似国自然基金
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
Data-driven Recommendation System Construction of an Online Medical Platform Based on the Fusion of Information
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:外国青年学者研究基金项目
Development of a Linear Stochastic Model for Wind Field Reconstruction from Limited Measurement Data
- 批准号:
- 批准年份:2020
- 资助金额:40 万元
- 项目类别:
基于Linked Open Data的Web服务语义互操作关键技术
- 批准号:61373035
- 批准年份:2013
- 资助金额:77.0 万元
- 项目类别:面上项目
Molecular Interaction Reconstruction of Rheumatoid Arthritis Therapies Using Clinical Data
- 批准号:31070748
- 批准年份:2010
- 资助金额:34.0 万元
- 项目类别:面上项目
高维数据的函数型数据(functional data)分析方法
- 批准号:11001084
- 批准年份:2010
- 资助金额:16.0 万元
- 项目类别:青年科学基金项目
染色体复制负调控因子datA在细胞周期中的作用
- 批准号:31060015
- 批准年份:2010
- 资助金额:25.0 万元
- 项目类别:地区科学基金项目
Computational Methods for Analyzing Toponome Data
- 批准号:60601030
- 批准年份:2006
- 资助金额:17.0 万元
- 项目类别:青年科学基金项目
相似海外基金
OAC Core: Improving Data Integrity for HPC Datasets using Sparsity Profile
OAC 核心:使用稀疏性配置文件提高 HPC 数据集的数据完整性
- 批准号:
2312982 - 财政年份:2023
- 资助金额:
$ 1.68万 - 项目类别:
Standard Grant
CRII: CNS: Exploring Data and Model Sparsity in Deep Learning Systems using Graphs
CRII:CNS:使用图探索深度学习系统中的数据和模型稀疏性
- 批准号:
2245849 - 财政年份:2023
- 资助金额:
$ 1.68万 - 项目类别:
Standard Grant
Adaptive Dependent Data Models via Graph-Informed Shrinkage and Sparsity
通过图通知收缩和稀疏性的自适应相关数据模型
- 批准号:
2214726 - 财政年份:2022
- 资助金额:
$ 1.68万 - 项目类别:
Standard Grant
Methods for big data, sparsity, and environmental thresholds
大数据、稀疏性和环境阈值的方法
- 批准号:
RGPIN-2021-03970 - 财政年份:2022
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Sparsity, thresholding and regularization in data science
数据科学中的稀疏性、阈值化和正则化
- 批准号:
RGPIN-2022-04531 - 财政年份:2022
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Sparsity, thresholding and regularization in data science
数据科学中的稀疏性、阈值化和正则化
- 批准号:
DGECR-2022-00453 - 财政年份:2022
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Launch Supplement
Addressing Sparsity in Metabolomics Data Analysis
解决代谢组学数据分析中的稀疏性
- 批准号:
10396831 - 财政年份:2021
- 资助金额:
$ 1.68万 - 项目类别:
Methods for big data, sparsity, and environmental thresholds
大数据、稀疏性和环境阈值的方法
- 批准号:
DGECR-2021-00271 - 财政年份:2021
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Launch Supplement
Methods for big data, sparsity, and environmental thresholds
大数据、稀疏性和环境阈值的方法
- 批准号:
RGPIN-2021-03970 - 财政年份:2021
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Overcoming Data Sparsity in Machine Translation
克服机器翻译中的数据稀疏性
- 批准号:
RGPIN-2017-05875 - 财政年份:2021
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual