权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RAPID: Automated discovery of COVID-19 related hypotheses using publicly available scientific literature

RAPID：使用公开的科学文献自动发现 COVID-19 相关假设

基本信息

批准号：
2027864
负责人：
Ilya Safro
金额：
$ 10.45万
依托单位：
Clemson University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2020
资助国家：
美国
起止时间：
2020-05-01 至 2021-04-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2027864&HistoricalAwards=false
关键词：
RAPID Automated discovery COVID 19

项目摘要

The vast amounts of biomedical information that accumulate in modern databases (such as MEDLINE of the National Library of Medicine) impose a great difficulty for efficient wide surveying by researchers who try to evaluate new information considering existing biomedical literature even when advanced information search engines are used. Automated hypotheses generation systems are designed to help scientists to overcome these difficulties and accelerate their research. The pandemic situation with COVID-19 is precisely one of the cases when such systems can play an extremely important role in coping with the coronavirus. Using two different AI approaches, we have developed two systems to discover plausible hypotheses in the biomedical domain. In this project, we will will deploy the COVID-19 customized hypothesis generation and knowledge discovery system, massively run it on any relevant to this research queries, and publish the results (including trained AI models, and discovered information) in the open domain for broad scientific community with the goal to accelerate the COVID-19 research. This work focuses heavily on addressing fundamental knowledge discovery questions by modeling and formulating scientific hypotheses using the publicly available information in the biomedical domain. However, in general, these methods are not restricted to any specific information domain, i.e., they can be broadly used to discover knowledge in texts. Although our experimental work will be related to COVID-19, the methods can be applied with some reservations to any literature-based analysis. For example, in the Materials Science Initiative, one of the goals is to establish a systematic understanding of the material properties and discover new materials which can be done by analyzing using the massive corpus of papers. In the legal world, identifying related patents can be done using a similar hypothesis modeling methodology.In the heart of the proposed approach lies a big multi-modal and multi-relational semantic knowledge network of all biomedical objects extracted from a variety of heterogeneous databases of the National Library of Medicine. These objects include but are not limited to scientific papers, abstracts, keywords, phrases, elements of thesaurus, genes, proteins, mutations, pathways, diseases, and diagnoses. We will leverage two systems, namely MOLIERE and AGATHA, that are based on structural and deep learning, respectively. We will customize them using the rapidly updated dataset of new papers that has not been yet processed by the National Library of Medicine but already exists in the open domain such as in various preprint archives and reports. The MOLIERE system is based on the network analysis techniques applied on the graph constructed using the low-dimensional embedding of the papers with the result interpretation methods that are based on the probabilistic topic modeling. The AGATHA system processes texts at much finer granularity, and creates a semantic knowledge network using more accurate embedding techniques followed by the deep learning training for knowledge discovery. Two systems complement each other. While the AGATHA is of higher quality, the MOLIERE is more interpretable. A combination of both will be leveraged in this research.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

现代数据库（如美国国家医学图书馆的MEDLINE）中积累了大量的生物医学信息，即使使用先进的信息搜索引擎，研究人员也很难在考虑现有生物医学文献的情况下评估新信息，从而进行有效的广泛调查。自动假设生成系统旨在帮助科学家克服这些困难并加速他们的研究。COVID-19大流行的情况正是此类系统可以在应对冠状病毒方面发挥极其重要作用的案例之一。使用两种不同的人工智能方法，我们开发了两种系统来发现生物医学领域的合理假设。在本项目中，我们将部署COVID-19定制假设生成和知识发现系统，在任何与本研究相关的查询上大规模运行，并将结果（包括训练过的AI模型和发现的信息）在开放领域发布给广泛的科学界，以加速COVID-19研究。这项工作主要侧重于通过使用生物医学领域的公开信息建模和制定科学假设来解决基础知识发现问题。然而，总的来说，这些方法并不局限于任何特定的信息领域，也就是说，它们可以广泛地用于发现文本中的知识。虽然我们的实验工作将与COVID-19相关，但这些方法可以在一定程度上适用于任何基于文献的分析。例如，在材料科学计划中，目标之一是建立对材料特性的系统理解，并通过使用大量的论文语料库进行分析来发现新材料。在法律界，可以使用类似的假设建模方法来确定相关专利。该方法的核心是从美国国家医学图书馆的各种异构数据库中提取的所有生物医学对象的大型多模态和多关系语义知识网络。这些对象包括但不限于科学论文、摘要、关键词、短语、词库元素、基因、蛋白质、突变、途径、疾病和诊断。我们将利用两个系统，即MOLIERE和AGATHA，它们分别基于结构学习和深度学习。我们将使用尚未由国家医学图书馆处理但已经存在于开放领域（如各种预印本档案和报告）的快速更新的新论文数据集来定制它们。MOLIERE系统采用基于概率主题建模的结果解释方法，将网络分析技术应用于论文低维嵌入构建的图上。AGATHA系统以更精细的粒度处理文本，并使用更精确的嵌入技术创建语义知识网络，然后使用深度学习训练进行知识发现。两种制度相辅相成。虽然AGATHA的质量更高，但MOLIERE的可解释性更强。这两者的结合将在本研究中得到利用。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Ilya Safro其他文献

FAIRLEARN: Configurable and Interpretable Algorithmic Fairness

FAIRLEARN：可配置和可解释的算法公平性

DOI：
发表时间：
2021
期刊：
arXiv.org
影响因子：
0
作者：
Ankit Kulshrestha;Ilya Safro
通讯作者：
Ilya Safro

Algebraic Distance on Graphs

图上的代数距离

DOI：
发表时间：
2011
期刊：
SIAM Journal on Scientific Computing
影响因子：
3.1
作者：
Jie Chen;Ilya Safro
通讯作者：
Ilya Safro

Multilevel Graph Partitioning for Three-Dimensional Discrete Fracture Network Flow Simulations

DOI：
10.1007/s11004-021-09944-y
发表时间：
2021-05-26
期刊：
Mathematical Geosciences
影响因子：
3.600
作者：
Hayato Ushijima-Mwesigwa;Jeffrey D. Hyman;Aric Hagberg;Ilya Safro;Satish Karra;Carl W. Gable;Matthew R. Sweeney;Gowri Srinivasan
通讯作者：
Gowri Srinivasan