Natural Language Data Management

自然语言数据管理

基本信息

  • 批准号:
    RGPIN-2018-04683
  • 负责人:
  • 金额:
    $ 2.04万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2018
  • 资助国家:
    加拿大
  • 起止时间:
    2018-01-01 至 2019-12-31
  • 项目状态:
    已结题

项目摘要

A large volume of data generated everyday is in some form of natural language intended for human consumption; this includes news articles, blog posts, tweets, scientific articles, Wikipedia entries, financial reports, etc. However, our querying capabilities over this data have remained very much limited to keyword search, which reduces the efficiency of the search and the scope of the information that can be retrieved. Specifically, a keyword search is not sufficient when the search is not limited to selection and involves join and set operations over the contents of the documents. Additionally, a keyword search is not very applicable when the granularity of the search result is smaller than a document.******This proposal advances the research in querying natural language data and the study of issues that hinder querying and managing such data (including both structured and unstructured) in documents. The particular challenges to be studied are: (1) storage and indexing, (2) querying and query processing, and (3) data integration and aggregation.******(1) Standard text-based indices such as inverted index often ignore the structure of natural language data and will not provide the best support for queries. A storage system for natural language data may track both the ordering and the lexical relations between words and between senses to better support certain classes of queries. For example, synonymy and hyponymy relationships may indicate a degree of locality in the sense that a document that matches a word is likely to match the synonyms and hyponyms of the word as well.******(2) Natural language data may be stored in and queried using a relational database, but composing queries over data can be cumbersome and relational systems may not provide the best support for the queries. A natural language data management system is expected to be geared towards the needs of applications that use natural language data by providing native support and treating natural language data as first class citizens. In particular, natural language data may be transformed to a meaning representation to better support reasoning and entailment detection and for integration with other sources. The querying system can then provide some support for these transformations; the querying system can also utilize the known relationships between fragments (e.g. distributional similarity) in both evaluating the queries and optimizing their evaluation.******(3) Natural language data that resides in different sources can refer to the same entities differently; even the references within the same source can be ambiguous if taken out of their contexts. Ambiguities introduce problems in integrating and aggregating data from multiple sources. Despite the progress in the area of entity resolution, many challenges remain. We will work toward addressing the challenges related to querying, by exploiting new developments in knowledge bases and linked data.
每天生成的大量数据都是以某种形式的自然语言供人类使用;这包括新闻文章,博客文章,推文,科学文章,维基百科条目,财务报告等,然而,我们对这些数据的查询能力仍然非常局限于关键字搜索,这降低了搜索的效率和可以检索的信息的范围。具体地说,当搜索不限于选择并且涉及对文档内容的连接和设置操作时,关键字搜索是不够的。此外,当搜索结果的粒度小于文档时,关键字搜索不是很适用。**这一建议推进了自然语言数据查询的研究,以及阻碍查询和管理文档中此类数据(包括结构化和非结构化)的问题的研究。需要研究的具体挑战是:(1)存储和索引,(2)查询和查询处理,以及(3)数据集成和聚合。(1)标准的基于文本的索引(如倒排索引)通常忽略自然语言数据的结构,并且不会为查询提供最佳支持。用于自然语言数据的存储系统可以跟踪单词之间和含义之间的排序和词汇关系,以更好地支持某些类别的查询。例如,同义词和下义词关系可以在与单词匹配的文档也可能与单词的同义词和下义词匹配的意义上指示局部性的程度。(2)自然语言数据可以存储在关系数据库中并使用关系数据库进行查询,但是对数据进行查询可能很麻烦,并且关系系统可能无法为查询提供最佳支持。自然语言数据管理系统预计将通过提供本地支持并将自然语言数据视为一等公民来适应使用自然语言数据的应用程序的需求。特别地,自然语言数据可以被转换为含义表示,以更好地支持推理和蕴涵检测,并用于与其他源的集成。然后查询系统可以为这些转换提供一些支持;查询系统还可以在评估查询和优化其评估时利用片段之间的已知关系(例如,分布相似性)。(3)驻留在不同来源中的自然语言数据可以以不同的方式引用相同的实体;即使是同一来源中的引用,如果脱离其上下文,也可能是模糊的。模糊性在整合和聚合来自多个来源的数据时会带来问题。尽管在解决实体问题方面取得了进展,但仍然存在许多挑战。我们将通过利用知识库和关联数据的新发展,努力应对与查询相关的挑战。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Rafiei, Davood其他文献

Rafiei, Davood的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Rafiei, Davood', 18)}}的其他基金

Natural Language Data Management
自然语言数据管理
  • 批准号:
    RGPIN-2018-04683
  • 财政年份:
    2022
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
Integrating third-party and open data with internal corporate databases
将第三方和开放数据与内部企业数据库集成
  • 批准号:
    542303-2019
  • 财政年份:
    2021
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Collaborative Research and Development Grants
Natural Language Data Management
自然语言数据管理
  • 批准号:
    RGPIN-2018-04683
  • 财政年份:
    2021
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
Natural Language Data Management
自然语言数据管理
  • 批准号:
    RGPIN-2018-04683
  • 财政年份:
    2020
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
Integrating third-party and open data with internal corporate databases
将第三方和开放数据与内部企业数据库集成
  • 批准号:
    542303-2019
  • 财政年份:
    2020
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Collaborative Research and Development Grants
Natural Language Data Management
自然语言数据管理
  • 批准号:
    RGPIN-2018-04683
  • 财政年份:
    2019
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
Integrating third-party and open data with internal corporate databases
将第三方和开放数据与内部企业数据库集成
  • 批准号:
    542303-2019
  • 财政年份:
    2019
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Collaborative Research and Development Grants
Enabling queries on relational data on the Web
启用对 Web 上的关系数据的查询
  • 批准号:
    239127-2013
  • 财政年份:
    2017
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
Fact extraction from organizational corpora
从组织语料库中提取事实
  • 批准号:
    522032-2017
  • 财政年份:
    2017
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Engage Grants Program
Enabling queries on relational data on the Web
启用对 Web 上的关系数据的查询
  • 批准号:
    239127-2013
  • 财政年份:
    2016
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual

相似海外基金

CAREER: Data-driven design of graphene oxide for environmental applications enabled by natural language processing and machine learning techniques
职业:通过自然语言处理和机器学习技术实现氧化石墨烯环境应用的数据驱动设计
  • 批准号:
    2238415
  • 财政年份:
    2023
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Continuing Grant
Applying Natural Language Processing to real-world patient data to optimise cancer care
将自然语言处理应用于现实世界的患者数据以优化癌症护理
  • 批准号:
    2897525
  • 财政年份:
    2023
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Studentship
RFA-CE-23-006 - Rigorous examination of anonymous reporting system data to prevent youth suicide and firearm violence: an applied natural language approach
RFA-CE-23-006 - 严格检查匿名报告系统数据以防止青少年自杀和枪支暴力:应用自然语言方法
  • 批准号:
    10786629
  • 财政年份:
    2023
  • 资助金额:
    $ 2.04万
  • 项目类别:
Collaborative Research: SHF: Medium: Natural Language Models with Execution Data for Software Testing
协作研究:SHF:媒介:用于软件测试的具有执行数据的自然语言模型
  • 批准号:
    2313028
  • 财政年份:
    2023
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Standard Grant
EAGER: SSMCDAT2023: Natural Language Processing and Large Language Models for Automated Extraction of Materials Chemistry Data from Scientific Literature
EAGER:SSMCDAT2023:用于从科学文献中自动提取材料化学数据的自然语言处理和大型语言模型
  • 批准号:
    2334411
  • 财政年份:
    2023
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Natural Language Models with Execution Data for Software Testing
协作研究:SHF:媒介:用于软件测试的具有执行数据的自然语言模型
  • 批准号:
    2313027
  • 财政年份:
    2023
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Standard Grant
Characterizing Bias and Care Disparities with Physical Restraint Use in the Emergency Setting Using Natural Language and Cognitive Data
使用自然语言和认知数据描述紧急情况下使用身体约束的偏见和护理差异
  • 批准号:
    10431043
  • 财政年份:
    2022
  • 资助金额:
    $ 2.04万
  • 项目类别:
Integrated AI analysis of natural language in nursing records by GPT and sensor data to support ward management
通过 GPT 和传感器数据对护理记录中的自然语言进行人工智能综合分析,以支持病房管理
  • 批准号:
    22K19684
  • 财政年份:
    2022
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Grant-in-Aid for Challenging Research (Exploratory)
Natural Language Data Management
自然语言数据管理
  • 批准号:
    RGPIN-2018-04683
  • 财政年份:
    2022
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
Characterizing Bias and Care Disparities with Physical Restraint Use in the Emergency Setting Using Natural Language and Cognitive Data
使用自然语言和认知数据描述紧急情况下使用身体约束的偏见和护理差异
  • 批准号:
    10633167
  • 财政年份:
    2022
  • 资助金额:
    $ 2.04万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了