权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Joining graph- and vector-based sense representations for semantic end-user information access (JOIN-T 2)

连接基于图和向量的意义表示以实现语义最终用户信息访问 (JOIN-T 2)

基本信息

批准号：
259256643
负责人：
Professor Dr. Christian Biemann
金额：
--
依托单位：
Arbeitsbereich Sprachtechnologie (LT)
依托单位国家：
德国
项目类别：
Research Grants
财政年份：
2014
资助国家：
德国
起止时间：
2013-12-31 至 2018-12-31
项目状态：
已结题

来源：
https://gepris.dfg.de/gepris/projekt/259256643?language=en
关键词：
Joining graph vector based sense

项目摘要

In recent years, research in Natural Language Processing (NLP) has led to major breakthroughs in language understanding. Computational semantics is one of the key areas in NLP, and accordingly a plethora of work focused on the representations of machine-readable knowledge along orthogonal dimensions such as manual vs. automatic acquisition, lexical vs. conceptual as well as dense vs. sparse representations. However, much work still lies ahead on combining these different dimensions together to complement their strengths and provide a unified semantic model and knowledge resource to tackle complex high-end NLP tasks. We propose an approach to meaning representation in context that is based on the graph-vector duality, namely a hypothesis stating that both graph and vector representations of lexical entities should be used at the same time to describe the semantics of these entities. To this end, we propose a computational framework and resource that integrate all these dimensions and combine the interpretability of manually crafted resources and sparse representations with the accuracy and high coverage of dense neural embeddings. We build upon our previous work on joining ontologies with graph-based distributional semantics, and take it to the next level by: i) joining it with dense semantic vector representations (a.k.a. embeddings) of text and knowledge bases (KBs) in a unified graph-vector semantic model; ii) extending the coverage to the long tail of (infrequent) named entities, including emerging ones, by leveraging extractions from Web-scale corpora; iii) exploring the benefits of a joint lexical, distributional and ontological representation for a high-end NLP task such as the browsing of document collections along structures such as entities and events. This is an application for the continuation of our previous project "JOIN-T". We successfully have addressed most work packages from the first project phase and are planning to complete the remaining work packages in the months during proposal review. The choice of topics for this continuation is informed by the key takeaways from the first project phase, namely i) linking of distributional semantic information with lexical ontologies is possible with high accuracy ii) disambiguation of lexical items in context towards distributional senses or ontological senses is possible with high accuracy using graph-based representations, but at the expense of computational efficiency, which hampers their scaling to very large corpora.

近年来，自然语言处理（NLP）的研究在语言理解方面取得了重大突破。计算语义是自然语言处理的关键领域之一，因此大量的工作集中在机器可读知识的沿着正交维度的表示上，例如手动与自动获取，词汇与概念以及密集与稀疏表示。然而，在将这些不同的维度结合在一起以补充其优势并提供统一的语义模型和知识资源来解决复杂的高端NLP任务方面，还有很多工作要做。我们提出了一种基于图-向量对偶的上下文意义表示方法，即假设词汇实体的图和向量表示应同时用于描述这些实体的语义。为此，我们提出了一个计算框架和资源，它集成了所有这些维度，并将手动制作的资源和稀疏表示的可解释性与密集神经嵌入的准确性和高覆盖率联合收割机相结合。我们建立在我们以前的工作加入本体与基于图的分布式语义，并把它带到一个新的水平：i）加入它与密集的语义向量表示（a.k.a.嵌入）的文本和知识库（知识库）在一个统一的图形向量语义模型; ii）扩大覆盖范围的长尾（不常见）命名实体，包括新兴实体，通过利用从网络规模语料库中提取; iii）探索联合词汇的好处，分布式和本体论表示，结束NLP任务，例如沿着实体和事件等结构浏览文档集合。这是我们之前的项目“JOIN-T”的延续。我们成功地解决了项目第一阶段的大部分工作包，并计划在提案审查期间的几个月内完成剩余的工作包。这个延续的主题的选择是由第一个项目阶段的关键要点所告知的，即i）分布语义信息与词汇本体的链接是可能的，具有高准确性ii）使用基于图形的表示，可以高准确性地消除上下文中的词汇项对分布意义或本体意义的歧义，但以牺牲计算效率为代价，这阻碍了它们扩展到非常大的语料库。