Theoretically founded algorithms for the automatic production of analogy tests in NLP
从理论上建立了 NLP 中类比测试自动生成的算法
基本信息
- 批准号:21K12038
- 负责人:
- 金额:$ 2.58万
- 依托单位:
- 依托单位国家:日本
- 项目类别:Grant-in-Aid for Scientific Research (C)
- 财政年份:2021
- 资助国家:日本
- 起止时间:2021-04-01 至 2024-03-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
During the second year, work on the tasks (a) to (c) have been pursued in parallel.(a) Two approaches were tried to cast vector representations of strings: The first one directly used Parikh vector representations and the second one a one hidden-layer neural network. Recall and precision were measured on various data.(b) Several directions were explored. (b.1) A series of experiments to approximate real-valued vectors to integer-valued vectors was run. Several analogy test sets in several languages were used. The new version of the programs with acceleration, implemented during the first fiscal year, was used. No parallelogram representing analogies between vectors can be discovered in none of the settings. This result has been published in a an international conference. (b.2) Work on casting words from word analogy test sets into their definitions, i.e., sentences, was done. The definitions with the analogical structure induced by the word analogies were used to fine-tune a sentence embedding space with contrastive learning. Such fine-tuned spaces delivered better performance in semantic similarity tasks. (b.3) Programs have been written to automatically extract series of analogies from a subspace around a given word. Preliminary experiments were run on classical examples. The obtained analogies are almost always formal, although they originate from an embedding space built using the distributional hypothesis.(c) Parallelisation of programs is considered as finished in the first fiscal year.
在第二年,平行开展了任务(a)至(c)的工作。(a)尝试了两种方法来铸造字符串的向量表示:第一种直接使用Parikh向量表示,第二种是一个隐藏层神经网络。召回率和精确度是根据各种数据进行测量的。(b)探讨了几个方向。(b.1)进行了一系列将实值向量近似为整数值向量的实验。使用了几种语言的几个模拟测试集。采用了第一个财政年度实施的加速方案的新版本。在任何一种设置中,都不能发现表示向量之间的类比的符号。这一结果已在一次国际会议上发表。(b.2)将单词类比测试集中的单词转换为它们的定义,即,句子,完成了。由词语类比所诱导的具有类比结构的定义被用于通过对比学习来微调句子嵌入空间。这种微调的空间在语义相似性任务中提供了更好的性能。(b.3)已经编写了程序来自动从给定单词周围的子空间中提取一系列类比。对经典实例进行了初步实验。所获得的类比几乎总是正式的,虽然他们起源于嵌入空间使用分布假设。(c)方案的自动化被认为在第一个财政年度完成。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Analogy on text data
文本数据的类比
- DOI:
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Fam Rashel;Lepage Yves;Yves Lepage;Yves Lepage
- 通讯作者:Yves Lepage
Investigating parallelograms: Assessing several word embedding spaces against various analogy test sets in several languages using approximation
研究平行四边形:使用近似法针对多种语言中的各种类比测试集评估多个词嵌入空间
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:M. Eget;X. Yang;and Y. Lepage;R. Fam and Y. Lepage
- 通讯作者:R. Fam and Y. Lepage
A study in the generation of multilingually aligned middle sentences
多语言对齐中间句生成的研究
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:M. Eget;X. Yang;and Y. Lepage
- 通讯作者:and Y. Lepage
A Study of Analogical Density in Various Corpora at Various Granularity
- DOI:10.3390/info12080314
- 发表时间:2021-08
- 期刊:
- 影响因子:0
- 作者:Rashel Fam;Y. Lepage
- 通讯作者:Rashel Fam;Y. Lepage
{{
                item.title }}
{{ item.translation_title }}
- DOI:{{ item.doi }} 
- 发表时间:{{ item.publish_year }} 
- 期刊:
- 影响因子:{{ item.factor }}
- 作者:{{ item.authors }} 
- 通讯作者:{{ item.author }} 
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:{{ item.author }} 
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:{{ item.author }} 
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:{{ item.author }} 
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:{{ item.author }} 
数据更新时间:{{ patent.updateTime }}
LEPAGE YVES其他文献
LEPAGE YVES的其他文献
{{
              item.title }}
{{ item.translation_title }}
- DOI:{{ item.doi }} 
- 发表时间:{{ item.publish_year }} 
- 期刊:
- 影响因子:{{ item.factor }}
- 作者:{{ item.authors }} 
- 通讯作者:{{ item.author }} 
{{ truncateString('LEPAGE YVES', 18)}}的其他基金
Self-explainable and fast-to-train example-based machine translation using neural networks
使用神经网络进行自我解释且快速训练的基于示例的机器翻译
- 批准号:18K11447 
- 财政年份:2018
- 资助金额:$ 2.58万 
- 项目类别:Grant-in-Aid for Scientific Research (C) 
Language productivity: fast extraction of productive analogical clusters and their evaluation using statistical machine translation
语言生产力:快速提取富有成效的类比簇并使用统计机器翻译对其进行评估
- 批准号:15K00317 
- 财政年份:2015
- 资助金额:$ 2.58万 
- 项目类别:Grant-in-Aid for Scientific Research (C) 

 刷新
              刷新
            
















 {{item.name}}会员
              {{item.name}}会员
            



