权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RAPID: Automated discovery of COVID-19 related hypotheses using publicly available scientific literature

RAPID：使用公开的科学文献自动发现 COVID-19 相关假设

基本信息

批准号：
2127776
负责人：
Ilya Safro
金额：
$ 10.45万
依托单位：
University of Delaware
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2020
资助国家：
美国
起止时间：
2020-10-01 至 2022-04-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2127776&HistoricalAwards=false
关键词：
RAPID Automated discovery COVID 19

项目摘要

The vast amounts of biomedical information that accumulate in modern databases (such as MEDLINE of the National Library of Medicine) impose a great difficulty for efficient wide surveying by researchers who try to evaluate new information considering existing biomedical literature even when advanced information search engines are used. Automated hypotheses generation systems are designed to help scientists to overcome these difficulties and accelerate their research. The pandemic situation with COVID-19 is precisely one of the cases when such systems can play an extremely important role in coping with the coronavirus. Using two different AI approaches, we have developed two systems to discover plausible hypotheses in the biomedical domain. In this project, we will will deploy the COVID-19 customized hypothesis generation and knowledge discovery system, massively run it on any relevant to this research queries, and publish the results (including trained AI models, and discovered information) in the open domain for broad scientific community with the goal to accelerate the COVID-19 research. This work focuses heavily on addressing fundamental knowledge discovery questions by modeling and formulating scientific hypotheses using the publicly available information in the biomedical domain. However, in general, these methods are not restricted to any specific information domain, i.e., they can be broadly used to discover knowledge in texts. Although our experimental work will be related to COVID-19, the methods can be applied with some reservations to any literature-based analysis. For example, in the Materials Science Initiative, one of the goals is to establish a systematic understanding of the material properties and discover new materials which can be done by analyzing using the massive corpus of papers. In the legal world, identifying related patents can be done using a similar hypothesis modeling methodology.In the heart of the proposed approach lies a big multi-modal and multi-relational semantic knowledge network of all biomedical objects extracted from a variety of heterogeneous databases of the National Library of Medicine. These objects include but are not limited to scientific papers, abstracts, keywords, phrases, elements of thesaurus, genes, proteins, mutations, pathways, diseases, and diagnoses. We will leverage two systems, namely MOLIERE and AGATHA, that are based on structural and deep learning, respectively. We will customize them using the rapidly updated dataset of new papers that has not been yet processed by the National Library of Medicine but already exists in the open domain such as in various preprint archives and reports. The MOLIERE system is based on the network analysis techniques applied on the graph constructed using the low-dimensional embedding of the papers with the result interpretation methods that are based on the probabilistic topic modeling. The AGATHA system processes texts at much finer granularity, and creates a semantic knowledge network using more accurate embedding techniques followed by the deep learning training for knowledge discovery. Two systems complement each other. While the AGATHA is of higher quality, the MOLIERE is more interpretable. A combination of both will be leveraged in this research.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

大量的生物医学信息，积累在现代数据库（如MEDLINE的国家医学图书馆）强加了一个很大的困难，有效的广泛调查的研究人员试图评估新的信息，考虑现有的生物医学文献，即使先进的信息搜索引擎使用。自动假设生成系统旨在帮助科学家克服这些困难，加快他们的研究。COVID-19疫情正是此类系统在应对冠状病毒方面发挥极其重要作用的案例之一。使用两种不同的人工智能方法，我们开发了两个系统来发现生物医学领域的合理假设。在这个项目中，我们将部署COVID-19定制的假设生成和知识发现系统，在任何与此研究相关的查询上大规模运行它，并在开放域中为广泛的科学界发布结果（包括经过训练的人工智能模型和发现的信息），旨在加速COVID-19研究。这项工作的重点是解决基本的知识发现问题，建模和制定科学假设，使用公开的信息在生物医学领域。然而，通常，这些方法不限于任何特定的信息域，即，它们可以广泛地用于发现文本中的知识。虽然我们的实验工作将与COVID-19相关，但这些方法可以在保留一些保留的情况下应用于任何基于文献的分析。例如，在材料科学计划中，目标之一是建立对材料特性的系统理解，并通过使用大量论文语料库进行分析来发现新材料。在法律的世界中，可以使用类似的假设建模方法来识别相关专利，所提出的方法的核心是从美国国家医学图书馆的各种异构数据库中提取的所有生物医学对象的大型多模态和多关系语义知识网络。这些对象包括但不限于科学论文、摘要、关键词、短语、同义词词典的元素、基因、蛋白质、突变、途径、疾病和诊断。我们将利用两个系统，即MOLIERE和AGATHA，分别基于结构学习和深度学习。我们将使用快速更新的新论文数据集来定制它们，这些数据集尚未被美国国家医学图书馆处理，但已经存在于开放领域，例如各种预印本档案和报告中。MOLIERE系统是基于网络分析技术，应用于使用论文的低维嵌入构建的图上，其结果解释方法基于概率主题建模。AGATHA系统以更细的粒度处理文本，并使用更准确的嵌入技术创建语义知识网络，然后进行深度学习训练以进行知识发现。两个系统相辅相成。虽然AGATHA质量更高，但MOLIERE更易于解释。该奖项反映了NSF的法定使命，并被认为值得通过使用基金会的知识价值和更广泛的影响审查标准进行评估来支持。

项目成果

期刊论文数量（1）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Accelerating COVID-19 Research with Graph Mining and Transformer-Based Learning

通过图挖掘和基于 Transformer 的学习加速 COVID-19 研究

DOI：
10.1609/aaai.v36i11.21543
发表时间：
2022
期刊：
Proceedings of the AAAI Conference on Artificial Intelligence
影响因子：
0
作者：
Tyagin, Ilya;Kulshrestha, Ankit;Sybrandt, Justin;Matta, Krish;Shtutman, Michael;Safro, Ilya
通讯作者：
Safro, Ilya

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Ilya Safro其他文献

FAIRLEARN: Configurable and Interpretable Algorithmic Fairness

FAIRLEARN：可配置和可解释的算法公平性

DOI：
发表时间：
2021
期刊：
arXiv.org
影响因子：
0
作者：
Ankit Kulshrestha;Ilya Safro
通讯作者：
Ilya Safro

Algebraic Distance on Graphs

图上的代数距离

DOI：
发表时间：
2011
期刊：
SIAM Journal on Scientific Computing
影响因子：
3.1
作者：
Jie Chen;Ilya Safro
通讯作者：
Ilya Safro

Multilevel Graph Partitioning for Three-Dimensional Discrete Fracture Network Flow Simulations

DOI：
10.1007/s11004-021-09944-y
发表时间：
2021-05-26
期刊：
Mathematical Geosciences
影响因子：
3.600
作者：
Hayato Ushijima-Mwesigwa;Jeffrey D. Hyman;Aric Hagberg;Ilya Safro;Satish Karra;Carl W. Gable;Matthew R. Sweeney;Gowri Srinivasan
通讯作者：
Gowri Srinivasan