A Natural Language Based Data Retrieval Engine for Automated Digital Data Extraction for Civil Infrastructure Projects

基于自然语言的数据检索引擎,用于土木基础设施项目的自动数字数据提取

基本信息

  • 批准号:
    1854400
  • 负责人:
  • 金额:
    $ 16.67万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-09-01 至 2020-02-29
  • 项目状态:
    已结题

项目摘要

This research project will create new knowledge and resources to significantly enhance the reusability of digital data during the lifecycle of civil infrastructure assets. The rapid development of digital technologies is transforming how civil infrastructure asset data and information is produced, exchanged, and managed throughout its life cycle. Despite growing digital data availability, such data cannot be fully exploited without the ability to infer meaning from the varying data terminologies entered by practitioners. The lack of common understanding of the same data, or similar data given in different terms, preclude data exchange or can lead to extraction of the wrong data and misinterpretation. This research project will leverage the advancements in linguistics and computer science to develop a novel approach that can recognize users' intention from their natural language input and automatically extract the desired data from heterogeneous datasets. The results of this research will benefit the construction industry by accelerating the industry's transition to digital data-based project delivery and asset management. The research will also broaden engineering education by creating advanced course materials both at undergraduate and graduate levels.Diversity in data terminology creates an important hurdle for computer-to-computer communication, creating a big burden to end users who must perform the role of middleware in digital data exchange. This issue exists throughout the life cycle of a civil infrastructure asset. This project will develop a computational theory and a platform for its implementation to analyze users' plain English data requirements, and automatically match their intention to the data entities in heterogeneous source datasets based on semantic equivalence. To accomplish this goal, the research team will: a) utilize Natural Language Processing and machine learning techniques to recognize user's intention from their natural language queries, b) translate text-based domain knowledge into an extensive civil engineering machine-readable dictionary that defines meanings of technical terms using a text-based automated ontology learning method, c) design an algorithm that finds the most semantic-relevant data entities in digital data sets for a given keyword input, and d) test the performance of the algorithm in terms of its accuracy using civil infrastructure text documents such as technical specifications, design manuals, and guidelines. The research outcomes will provide fundamental tools and resources for other researchers and industry professionals for various text-mining and intelligence-inference systems. It will facilitate seamless data exchange between various proprietary software applications used during the life cycle of civil infrastructure assets, including applications involving design evaluation and selection, digital model construction, and regulation compliance checking.
这一研究项目将创造新的知识和资源,大大提高数字数据在民用基础设施资产生命周期中的可重用性。数字技术的快速发展正在改变民用基础设施资产数据和信息在其整个生命周期中的产生、交换和管理方式。尽管数字数据的可用性越来越高,但如果不能从从业者输入的不同数据术语中推断含义,就不能充分利用这些数据。对相同数据或以不同术语提供的类似数据缺乏共同理解,就无法进行数据交换,或可能导致提取错误数据和误解。这个研究项目将利用语言学和计算机科学的进步来开发一种新的方法,可以从用户的自然语言输入中识别用户的意图,并从不同的数据集中自动提取所需的数据。这项研究的结果将使建筑业受益,加速该行业向基于数字数据的项目交付和资产管理的过渡。这项研究还将通过创建本科生和研究生水平的高级课程材料来拓宽工程教育。数据术语的多样性为计算机之间的通信制造了一个重要的障碍,给必须在数字数据交换中扮演中间件角色的最终用户造成了巨大的负担。这个问题存在于民用基础设施资产的整个生命周期中。该项目将开发一个计算理论及其实现平台,以分析用户的纯英语数据需求,并基于语义等价将用户的意图与异构源数据集中的数据实体自动匹配。为了实现这一目标,研究团队将:a)利用自然语言处理和机器学习技术从自然语言查询中识别用户的意图,b)将基于文本的领域知识转换为广泛的土木工程机器可读词典,该词典使用基于文本的自动本体学习方法定义技术术语的含义,c)设计一种算法,在数字数据集中为给定的关键字输入找到与语义最相关的数据实体,以及d)使用诸如技术规范、设计手册和指南等民用基础设施文本文档来测试算法的准确性。研究成果将为其他研究人员和行业专业人士提供各种文本挖掘和智能推理系统的基本工具和资源。它将促进在民用基础设施资产生命周期中使用的各种专有软件应用程序之间的无缝数据交换,包括涉及设计评估和选择、数字模型构建和法规遵从性检查的应用程序。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Technical Term Similarity Model for Natural Language Based Data Retrieval in Civil Infrastructure Projects
  • DOI:
    10.22260/isarc2016/0126
  • 发表时间:
    2016-07
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Tuyen Le;H. D. Jeong
  • 通讯作者:
    Tuyen Le;H. D. Jeong
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

H David Jeong其他文献

H David Jeong的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('H David Jeong', 18)}}的其他基金

REU Site: Smart and Sustainable Construction in the Digital Era
REU 网站:数字时代的智能和可持续建筑
  • 批准号:
    2244490
  • 财政年份:
    2023
  • 资助金额:
    $ 16.67万
  • 项目类别:
    Standard Grant
A Natural Language Based Data Retrieval Engine for Automated Digital Data Extraction for Civil Infrastructure Projects
基于自然语言的数据检索引擎,用于土木基础设施项目的自动数字数据提取
  • 批准号:
    1635309
  • 财政年份:
    2016
  • 资助金额:
    $ 16.67万
  • 项目类别:
    Standard Grant

相似海外基金

CAREER: Insertion-Based Natural Language Generation
职业:基于插入的自然语言生成
  • 批准号:
    2339766
  • 财政年份:
    2024
  • 资助金额:
    $ 16.67万
  • 项目类别:
    Continuing Grant
Deep Learning Based Natural Language Processing Markers of Anxiety and Depression
基于深度学习的自然语言处理的焦虑和抑郁标记
  • 批准号:
    10723819
  • 财政年份:
    2023
  • 资助金额:
    $ 16.67万
  • 项目类别:
Improving flexibility and performance of the Acute Care Enhanced Surveillance (ACES) System for public health surveillance: an ensemble of state-of-the-art machine learning and rule-based natural language processing methods
提高用于公共卫生监测的急性护理增强监测 (ACES) 系统的灵活性和性能:最先进的机器学习和基于规则的自然语言处理方法的集合
  • 批准号:
    468864
  • 财政年份:
    2022
  • 资助金额:
    $ 16.67万
  • 项目类别:
    Operating Grants
A Natural Language Processing-based Chatbot for Smoking Cessation
基于自然语言处理的戒烟聊天机器人
  • 批准号:
    572579-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 16.67万
  • 项目类别:
    University Undergraduate Student Research Awards
Valid N-Grams Identification Web Service based on Statistical Natural Language Processing Techniques
基于统计自然语言处理技术的有效N-Grams识别Web服务
  • 批准号:
    579993-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 16.67万
  • 项目类别:
    University Undergraduate Student Research Awards
A Natural Language Processing-based Chatbot for Smoking Cessation
基于自然语言处理的戒烟聊天机器人
  • 批准号:
    572664-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 16.67万
  • 项目类别:
    University Undergraduate Student Research Awards
The mechanism of loneliness based on self-memory system-An integrative approach of Psychology and Natural Language Processing
基于自我记忆系统的孤独感机制——心理学与自然语言处理的综合研究
  • 批准号:
    21K13683
  • 财政年份:
    2021
  • 资助金额:
    $ 16.67万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Investigation of the viability of direct evaluation of natural language queries with respect to event based triplestores
研究基于事件的三元组存储的自然语言查询直接评估的可行性
  • 批准号:
    RGPIN-2016-04502
  • 财政年份:
    2021
  • 资助金额:
    $ 16.67万
  • 项目类别:
    Discovery Grants Program - Individual
An in-depth study on the relationship between e-word-of-mouth and consumer behavior based on natural language processing
基于自然语言处理的电子口碑与消费者行为关系的深入研究
  • 批准号:
    21K13386
  • 财政年份:
    2021
  • 资助金额:
    $ 16.67万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Natural Language Processing-based Chatbots for Addiction Therapy
基于自然语言处理的聊天机器人用于成瘾治疗
  • 批准号:
    562070-2021
  • 财政年份:
    2021
  • 资助金额:
    $ 16.67万
  • 项目类别:
    University Undergraduate Student Research Awards
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了