A Natural Language Based Data Retrieval Engine for Automated Digital Data Extraction for Civil Infrastructure Projects

基于自然语言的数据检索引擎,用于土木基础设施项目的自动数字数据提取

基本信息

  • 批准号:
    1635309
  • 负责人:
  • 金额:
    $ 28.53万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2016
  • 资助国家:
    美国
  • 起止时间:
    2016-09-01 至 2018-11-30
  • 项目状态:
    已结题

项目摘要

This research project will create new knowledge and resources to significantly enhance the reusability of digital data during the lifecycle of civil infrastructure assets. The rapid development of digital technologies is transforming how civil infrastructure asset data and information is produced, exchanged, and managed throughout its life cycle. Despite growing digital data availability, such data cannot be fully exploited without the ability to infer meaning from the varying data terminologies entered by practitioners. The lack of common understanding of the same data, or similar data given in different terms, preclude data exchange or can lead to extraction of the wrong data and misinterpretation. This research project will leverage the advancements in linguistics and computer science to develop a novel approach that can recognize users' intention from their natural language input and automatically extract the desired data from heterogeneous datasets. The results of this research will benefit the construction industry by accelerating the industry's transition to digital data-based project delivery and asset management. The research will also broaden engineering education by creating advanced course materials both at undergraduate and graduate levels.Diversity in data terminology creates an important hurdle for computer-to-computer communication, creating a big burden to end users who must perform the role of middleware in digital data exchange. This issue exists throughout the life cycle of a civil infrastructure asset. This project will develop a computational theory and a platform for its implementation to analyze users' plain English data requirements, and automatically match their intention to the data entities in heterogeneous source datasets based on semantic equivalence. To accomplish this goal, the research team will: a) utilize Natural Language Processing and machine learning techniques to recognize user's intention from their natural language queries, b) translate text-based domain knowledge into an extensive civil engineering machine-readable dictionary that defines meanings of technical terms using a text-based automated ontology learning method, c) design an algorithm that finds the most semantic-relevant data entities in digital data sets for a given keyword input, and d) test the performance of the algorithm in terms of its accuracy using civil infrastructure text documents such as technical specifications, design manuals, and guidelines. The research outcomes will provide fundamental tools and resources for other researchers and industry professionals for various text-mining and intelligence-inference systems. It will facilitate seamless data exchange between various proprietary software applications used during the life cycle of civil infrastructure assets, including applications involving design evaluation and selection, digital model construction, and regulation compliance checking.
该研究项目将创造新的知识和资源,以显著提高民用基础设施资产生命周期中数字数据的可重用性。数字技术的快速发展正在改变民用基础设施资产数据和信息在其整个生命周期内的产生、交换和管理方式。尽管数字数据的可用性不断增加,但如果不能从从业者输入的不同数据术语中推断含义,则无法充分利用这些数据。对相同数据或以不同术语给出的类似数据缺乏共同理解,会妨碍数据交换或可能导致提取错误数据和误解。该研究项目将利用语言学和计算机科学的进步来开发一种新的方法,可以从用户的自然语言输入中识别用户的意图,并自动从异构数据集中提取所需的数据。这项研究的结果将使建筑行业受益,加速行业向基于数字数据的项目交付和资产管理的过渡。这项研究还将通过为本科生和研究生创造先进的课程材料,拓宽工程教育。数据术语的多样性为计算机到计算机通信造成了一个重要障碍,给必须在数字数据交换中扮演中间件角色的最终用户造成了很大的负担。这个问题贯穿于民用基础设施资产的整个生命周期。本项目将开发计算理论及其实现平台,分析用户的纯英文数据需求,并基于语义等价自动匹配其意图与异构源数据集中的数据实体。为了实现这一目标,研究团队将:a)利用自然语言处理和机器学习技术从用户的自然语言查询中识别用户的意图;b)将基于文本的领域知识翻译成广泛的土木工程机器可读词典,该词典使用基于文本的自动本体学习方法定义技术术语的含义;c)设计一种算法,在给定的关键字输入的数字数据集中找到与语义最相关的数据实体。d)使用民用基础设施文本文件(如技术规范、设计手册和指南)测试算法的准确性。研究成果将为其他研究人员和行业专业人员提供各种文本挖掘和智能推理系统的基本工具和资源。它将促进在民用基础设施资产生命周期中使用的各种专有软件应用程序之间的无缝数据交换,包括涉及设计评估和选择、数字模型构建和法规遵从性检查的应用程序。

项目成果

期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Parsing Natural Language Queries for Extracting Data from Large-Scale Geospatial Transportation Asset Repositories
  • DOI:
    10.1061/9780784481295.008
  • 发表时间:
    2018-03
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Tuyen Le;H. D. Jeong;Stephen B Gilbert;E. Chukharev-Hudilainen
  • 通讯作者:
    Tuyen Le;H. D. Jeong;Stephen B Gilbert;E. Chukharev-Hudilainen
NLP-Based Approach to Semantic Classification of Heterogeneous Transportation Asset Data Terminology
  • DOI:
    10.1061/(asce)cp.1943-5487.0000701
  • 发表时间:
    2017-11
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Tuyen Le;H. D. Jeong
  • 通讯作者:
    Tuyen Le;H. D. Jeong
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

H David Jeong其他文献

H David Jeong的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('H David Jeong', 18)}}的其他基金

REU Site: Smart and Sustainable Construction in the Digital Era
REU 网站:数字时代的智能和可持续建筑
  • 批准号:
    2244490
  • 财政年份:
    2023
  • 资助金额:
    $ 28.53万
  • 项目类别:
    Standard Grant
A Natural Language Based Data Retrieval Engine for Automated Digital Data Extraction for Civil Infrastructure Projects
基于自然语言的数据检索引擎,用于土木基础设施项目的自动数字数据提取
  • 批准号:
    1854400
  • 财政年份:
    2018
  • 资助金额:
    $ 28.53万
  • 项目类别:
    Standard Grant

相似海外基金

CAREER: Insertion-Based Natural Language Generation
职业:基于插入的自然语言生成
  • 批准号:
    2339766
  • 财政年份:
    2024
  • 资助金额:
    $ 28.53万
  • 项目类别:
    Continuing Grant
Deep Learning Based Natural Language Processing Markers of Anxiety and Depression
基于深度学习的自然语言处理的焦虑和抑郁标记
  • 批准号:
    10723819
  • 财政年份:
    2023
  • 资助金额:
    $ 28.53万
  • 项目类别:
Improving flexibility and performance of the Acute Care Enhanced Surveillance (ACES) System for public health surveillance: an ensemble of state-of-the-art machine learning and rule-based natural language processing methods
提高用于公共卫生监测的急性护理增强监测 (ACES) 系统的灵活性和性能:最先进的机器学习和基于规则的自然语言处理方法的集合
  • 批准号:
    468864
  • 财政年份:
    2022
  • 资助金额:
    $ 28.53万
  • 项目类别:
    Operating Grants
A Natural Language Processing-based Chatbot for Smoking Cessation
基于自然语言处理的戒烟聊天机器人
  • 批准号:
    572579-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 28.53万
  • 项目类别:
    University Undergraduate Student Research Awards
Valid N-Grams Identification Web Service based on Statistical Natural Language Processing Techniques
基于统计自然语言处理技术的有效N-Grams识别Web服务
  • 批准号:
    579993-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 28.53万
  • 项目类别:
    University Undergraduate Student Research Awards
A Natural Language Processing-based Chatbot for Smoking Cessation
基于自然语言处理的戒烟聊天机器人
  • 批准号:
    572664-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 28.53万
  • 项目类别:
    University Undergraduate Student Research Awards
The mechanism of loneliness based on self-memory system-An integrative approach of Psychology and Natural Language Processing
基于自我记忆系统的孤独感机制——心理学与自然语言处理的综合研究
  • 批准号:
    21K13683
  • 财政年份:
    2021
  • 资助金额:
    $ 28.53万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Investigation of the viability of direct evaluation of natural language queries with respect to event based triplestores
研究基于事件的三元组存储的自然语言查询直接评估的可行性
  • 批准号:
    RGPIN-2016-04502
  • 财政年份:
    2021
  • 资助金额:
    $ 28.53万
  • 项目类别:
    Discovery Grants Program - Individual
An in-depth study on the relationship between e-word-of-mouth and consumer behavior based on natural language processing
基于自然语言处理的电子口碑与消费者行为关系的深入研究
  • 批准号:
    21K13386
  • 财政年份:
    2021
  • 资助金额:
    $ 28.53万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Natural Language Processing-based Chatbots for Addiction Therapy
基于自然语言处理的聊天机器人用于成瘾治疗
  • 批准号:
    562070-2021
  • 财政年份:
    2021
  • 资助金额:
    $ 28.53万
  • 项目类别:
    University Undergraduate Student Research Awards
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了