RAPID: Automated discovery of COVID-19 related hypotheses using publicly available scientific literature

RAPID:使用公开的科学文献自动发现 COVID-19 相关假设

基本信息

  • 批准号:
    2127776
  • 负责人:
  • 金额:
    $ 10.45万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-10-01 至 2022-04-30
  • 项目状态:
    已结题

项目摘要

The vast amounts of biomedical information that accumulate in modern databases (such as MEDLINE of the National Library of Medicine) impose a great difficulty for efficient wide surveying by researchers who try to evaluate new information considering existing biomedical literature even when advanced information search engines are used. Automated hypotheses generation systems are designed to help scientists to overcome these difficulties and accelerate their research. The pandemic situation with COVID-19 is precisely one of the cases when such systems can play an extremely important role in coping with the coronavirus. Using two different AI approaches, we have developed two systems to discover plausible hypotheses in the biomedical domain. In this project, we will will deploy the COVID-19 customized hypothesis generation and knowledge discovery system, massively run it on any relevant to this research queries, and publish the results (including trained AI models, and discovered information) in the open domain for broad scientific community with the goal to accelerate the COVID-19 research. This work focuses heavily on addressing fundamental knowledge discovery questions by modeling and formulating scientific hypotheses using the publicly available information in the biomedical domain. However, in general, these methods are not restricted to any specific information domain, i.e., they can be broadly used to discover knowledge in texts. Although our experimental work will be related to COVID-19, the methods can be applied with some reservations to any literature-based analysis. For example, in the Materials Science Initiative, one of the goals is to establish a systematic understanding of the material properties and discover new materials which can be done by analyzing using the massive corpus of papers. In the legal world, identifying related patents can be done using a similar hypothesis modeling methodology.In the heart of the proposed approach lies a big multi-modal and multi-relational semantic knowledge network of all biomedical objects extracted from a variety of heterogeneous databases of the National Library of Medicine. These objects include but are not limited to scientific papers, abstracts, keywords, phrases, elements of thesaurus, genes, proteins, mutations, pathways, diseases, and diagnoses. We will leverage two systems, namely MOLIERE and AGATHA, that are based on structural and deep learning, respectively. We will customize them using the rapidly updated dataset of new papers that has not been yet processed by the National Library of Medicine but already exists in the open domain such as in various preprint archives and reports. The MOLIERE system is based on the network analysis techniques applied on the graph constructed using the low-dimensional embedding of the papers with the result interpretation methods that are based on the probabilistic topic modeling. The AGATHA system processes texts at much finer granularity, and creates a semantic knowledge network using more accurate embedding techniques followed by the deep learning training for knowledge discovery. Two systems complement each other. While the AGATHA is of higher quality, the MOLIERE is more interpretable. A combination of both will be leveraged in this research.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
大量的生物医学信息,积累在现代数据库(如MEDLINE的国家医学图书馆)强加了一个很大的困难,有效的广泛调查的研究人员试图评估新的信息,考虑现有的生物医学文献,即使先进的信息搜索引擎使用。自动假设生成系统旨在帮助科学家克服这些困难,加快他们的研究。COVID-19疫情正是此类系统在应对冠状病毒方面发挥极其重要作用的案例之一。使用两种不同的人工智能方法,我们开发了两个系统来发现生物医学领域的合理假设。在这个项目中,我们将部署COVID-19定制的假设生成和知识发现系统,在任何与此研究相关的查询上大规模运行它,并在开放域中为广泛的科学界发布结果(包括经过训练的人工智能模型和发现的信息),旨在加速COVID-19研究。这项工作的重点是解决基本的知识发现问题,建模和制定科学假设,使用公开的信息在生物医学领域。然而,通常,这些方法不限于任何特定的信息域,即,它们可以广泛地用于发现文本中的知识。虽然我们的实验工作将与COVID-19相关,但这些方法可以在保留一些保留的情况下应用于任何基于文献的分析。例如,在材料科学计划中,目标之一是建立对材料特性的系统理解,并通过使用大量论文语料库进行分析来发现新材料。在法律的世界中,可以使用类似的假设建模方法来识别相关专利,所提出的方法的核心是从美国国家医学图书馆的各种异构数据库中提取的所有生物医学对象的大型多模态和多关系语义知识网络。这些对象包括但不限于科学论文、摘要、关键词、短语、同义词词典的元素、基因、蛋白质、突变、途径、疾病和诊断。我们将利用两个系统,即MOLIERE和AGATHA,分别基于结构学习和深度学习。我们将使用快速更新的新论文数据集来定制它们,这些数据集尚未被美国国家医学图书馆处理,但已经存在于开放领域,例如各种预印本档案和报告中。MOLIERE系统是基于网络分析技术,应用于使用论文的低维嵌入构建的图上,其结果解释方法基于概率主题建模。AGATHA系统以更细的粒度处理文本,并使用更准确的嵌入技术创建语义知识网络,然后进行深度学习训练以进行知识发现。两个系统相辅相成。虽然AGATHA质量更高,但MOLIERE更易于解释。该奖项反映了NSF的法定使命,并被认为值得通过使用基金会的知识价值和更广泛的影响审查标准进行评估来支持。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Accelerating COVID-19 Research with Graph Mining and Transformer-Based Learning
通过图挖掘和基于 Transformer 的学习加速 COVID-19 研究
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Ilya Safro其他文献

FAIRLEARN: Configurable and Interpretable Algorithmic Fairness
FAIRLEARN:可配置和可解释的算法公平性
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Ankit Kulshrestha;Ilya Safro
  • 通讯作者:
    Ilya Safro
Algebraic Distance on Graphs
图上的代数距离
Multilevel Graph Partitioning for Three-Dimensional Discrete Fracture Network Flow Simulations
  • DOI:
    10.1007/s11004-021-09944-y
  • 发表时间:
    2021-05-26
  • 期刊:
  • 影响因子:
    3.600
  • 作者:
    Hayato Ushijima-Mwesigwa;Jeffrey D. Hyman;Aric Hagberg;Ilya Safro;Satish Karra;Carl W. Gable;Matthew R. Sweeney;Gowri Srinivasan
  • 通讯作者:
    Gowri Srinivasan
Randomized heuristics for exploiting Jacobian scarcity
利用雅可比稀缺性的随机启发式
A Measure of the Connection Strengths between Graph Vertices with Applications
图顶点间连接强度的测量及其应用
  • DOI:
  • 发表时间:
    2009
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jie Chen;Ilya Safro
  • 通讯作者:
    Ilya Safro

Ilya Safro的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Ilya Safro', 18)}}的其他基金

RAPID: Automated discovery of COVID-19 related hypotheses using publicly available scientific literature
RAPID:使用公开的科学文献自动发现 COVID-19 相关假设
  • 批准号:
    2027864
  • 财政年份:
    2020
  • 资助金额:
    $ 10.45万
  • 项目类别:
    Standard Grant
Collaborative Research: EAGER: QIA: Large Scale QAOA Quantum Simulator
合作研究:EAGER:QIA:大规模 QAOA 量子模拟器
  • 批准号:
    2035606
  • 财政年份:
    2020
  • 资助金额:
    $ 10.45万
  • 项目类别:
    Standard Grant
Collaborative Research: EAGER: QIA: Large Scale QAOA Quantum Simulator
合作研究:EAGER:QIA:大规模 QAOA 量子模拟器
  • 批准号:
    2122793
  • 财政年份:
    2020
  • 资助金额:
    $ 10.45万
  • 项目类别:
    Standard Grant
EAGER: SSDIM: Multiscale Methods for Generating Infrastructure Networks
EAGER:SSDIM:生成基础设施网络的多尺度方法
  • 批准号:
    1745300
  • 财政年份:
    2017
  • 资助金额:
    $ 10.45万
  • 项目类别:
    Standard Grant
EAGER: Feedback-based Network Optimization for Smart Cities
EAGER:基于反馈的智慧城市网络优化
  • 批准号:
    1647361
  • 财政年份:
    2016
  • 资助金额:
    $ 10.45万
  • 项目类别:
    Standard Grant
Fast and Scalable Multigrid Methods for Hypergraph Partitioning Problems
超图分区问题的快速且可扩展的多重网格方法
  • 批准号:
    1522751
  • 财政年份:
    2015
  • 资助金额:
    $ 10.45万
  • 项目类别:
    Standard Grant

相似海外基金

A Semi-Automated Antibody-Discovery Platform to Target Challenging Biomolecules
针对具有挑战性的生物分子的半自动化抗体发现平台
  • 批准号:
    MR/Y003616/1
  • 财政年份:
    2024
  • 资助金额:
    $ 10.45万
  • 项目类别:
    Fellowship
Automated reactor platforms for accelerated discovery of next generation polymers
用于加速发现下一代聚合物的自动化反应器平台
  • 批准号:
    2911012
  • 财政年份:
    2024
  • 资助金额:
    $ 10.45万
  • 项目类别:
    Studentship
Automated Discovery and Screening of Stimuli-Responsive Porous Liquids
刺激响应多孔液体的自动发现和筛选
  • 批准号:
    2896345
  • 财政年份:
    2023
  • 资助金额:
    $ 10.45万
  • 项目类别:
    Studentship
Computational Infrastructure for Automated Force Field Development and Optimization
用于自动力场开发和优化的计算基础设施
  • 批准号:
    10699200
  • 财政年份:
    2023
  • 资助金额:
    $ 10.45万
  • 项目类别:
SyncroPatch 384 Automated Patch Clamp Instrument
SyncroPatch 384 自动膜片钳仪器
  • 批准号:
    10721590
  • 财政年份:
    2023
  • 资助金额:
    $ 10.45万
  • 项目类别:
DMREF/Collaborative Research: Accelerated Discovery of Sustainable Bioplastics: Automated, Tunable, Integrated Design, Processing and Modeling
DMREF/合作研究:加速可持续生物塑料的发现:自动化、可调、集成设计、加工和建模
  • 批准号:
    2323976
  • 财政年份:
    2023
  • 资助金额:
    $ 10.45万
  • 项目类别:
    Standard Grant
Collaborative Research: SaTC: CORE: Medium: Audacity of Exploration: Toward Automated Discovery of Security Flaws in Networked Systems through Intelligent Documentation Analysis
协作研究:SaTC:核心:中:大胆探索:通过智能文档分析自动发现网络系统中的安全缺陷
  • 批准号:
    2409269
  • 财政年份:
    2023
  • 资助金额:
    $ 10.45万
  • 项目类别:
    Standard Grant
DMREF/Collaborative Research: Accelerated Discovery of Sustainable Bioplastics: Automated, Tunable, Integrated Design, Processing and Modeling
DMREF/合作研究:加速可持续生物塑料的发现:自动化、可调、集成设计、加工和建模
  • 批准号:
    2323977
  • 财政年份:
    2023
  • 资助金额:
    $ 10.45万
  • 项目类别:
    Standard Grant
Automated Model Discovery for Soft Matter
软物质的自动模型发现
  • 批准号:
    2320933
  • 财政年份:
    2023
  • 资助金额:
    $ 10.45万
  • 项目类别:
    Continuing Grant
An automated high-throughput robotic platform for accelerated battery and fuels discovery - DIGIBAT
用于加速电池和燃料发现的自动化高通量机器人平台 - DIGIBAT
  • 批准号:
    EP/W036517/1
  • 财政年份:
    2023
  • 资助金额:
    $ 10.45万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了