EVIDARA: Automated Evidential Support from Raw Data for relay agents in Biomedical KG Queries
EVIDARA:生物医学 KG 查询中中继代理的原始数据自动证据支持
基本信息
- 批准号:10057190
- 负责人:
- 金额:$ 89.6万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-01-24 至 2024-11-30
- 项目状态:已结题
- 来源:
- 关键词:AddressAlgorithmsBig DataChickensConflict (Psychology)DataData AnalyticsData SetDatabasesDevelopmentDiseaseEpidemiologyEpistemologyGlycine decarboxylaseKnowledgeLearningMalignant NeoplasmsMeasurementMedicalMolecularMultiomic DataNamesPathway interactionsProviderQuality ControlRecording of previous eventsResearchResearch PersonnelResourcesRoleScienceSignal TransductionSourceSystemTechnical ExpertiseTestingVisionVitamin KWalkingWeightWorkbasebiobankcancer riskcohortdisorder riskeggexperienceimplementation researchimprovedinteroperabilityknowledge graphmedical specialtiesmultiple omicsprogramsstem cellstool
项目摘要
1) Component: Autonomous Relay Agent.
We will develop an ARA named EVIDARA to
evaluate returns from queries in knowledge
sources (KS) using a new epistemology: The
“reasoning” is based on checking against empirical
evidence available in raw data (measurements)
instead of deductive reasoning
(FIG.►). EVIDARA will assist the Autonomous
Relay System (ARS) to identify paths in returned
knowledge graphs (KG) that may
conflict with real-word evidence and to relay queries to appropriate specialty KS or database.
(2) Problem addressed: EHR and multi-omics raw data from large cohorts, if properly preprocessed
[e.g., by Knowledge Providers, such as the DOCKET, see application by Dr. Glusman],
offers a new opportunity for ad hoc systematic extraction of empirical knowledge on relationships
(“Protein P level correlates with risk for disease D”) instead of relying on specific epidemiological
analyses. The problem in harnessing raw data for empirical support in lieu of deductive reasoning
is that the KGs to be evaluated are extracted from knowledge sources of distinct types and that
the relevance of paths depends on the query context Q. Also the ARA algorithm should be scalable
to digest the emerging multi-omics data from projects like All-of-Us, the UK Biobank.
(3) Plan for implementation: Research will be conducted to evaluate a new epistemic realm:
make empirical evidence central to “reasoning”. We have assembled a set of functioning tools to
overcome the chicken-egg problem of getting a project started and jumpstart development and
testing of EVIDARA: (i) SPOKE, one of the largest biomedical knowledge network (KN) has integrated
25 diverse of KS into a single (neo4j) network database of 2 million nodes and will serve
as testing ground for research well before we can use KGs produced by the Knowledge Providers.
(ii) Algorithms that use raw data from EHR and multi-omics studies to evaluate the returned KGs.
For instance, we compute weights of all nodes in the entire KN through a random-walk algorithm
biased by their role for a given condition Q observed in the raw data. (iii) Raw data beyond EHR:
multi-omics profiles from a study at ISB with >10k variables which vastly exceeds coverage of
observable nodes in KNs offered by EHRs. Example query: “Vitamin K stimulates stem-cell signaling,
thus could promote cancer. What is the molecular pathway? Mechanisms returned as KG
will be pruned by EVIDARA and checked against correlative evidence in the raw data: Is there
evidence that taking Vit. K or its antagonist reduces cancer risk?”. Importantly, since EVIDARA
learns on a network of many types of KS, it will provide information to the ARS about which type
of KS/Knowledge Provider to invoke next (in iterative queries) to improve the knowledge graph.
(4) Expertise & resources: The MPIs, Drs. S. Baranzini (UCSF) and S. Huang (ISB) are researchers
with long history of working with medical big data, thus offering technical expertise and the
critical SME perspective. SB’s team has created and maintains SPOKE. The uniquely self-contained
SPOKE network will allow NCATS staff to test other ARAs. SH brings decades of experience
in research of disease mechanisms and medical epistemology. His team will provide multi-omics
datasets and data analytics expertise. With his prior work in the NCATS Translator program, he is
well poised to maximize team science efficiency and help convert its vision into tangible results.
(5) Potential challenges. (i) Quality of evidential support depends on quality of raw data. A quality
control is beyond the scope of EVIDARA but could be provided by Knowledge Providers focusing
on new multi-omics data sets (e.g. DOCKET). (ii) Testing EVIDARA on other KS from Knowledge
Providers) may be slowed down by interoperability issues (e.g. incompatible identifiers). Such
issues will be addressed early in Year 1 with help of the Standard and Reference group.
1)组件:自主中继代理。
我们将开发一个名为EVIDARA的ARA来
评估知识中查询的结果
使用新认识论的来源(KS):
“推理”是基于对经验主义的检验。
原始数据中可用的证据(测量)
而不是演绎推理
(图►)。EVIDARA将协助自治州
用于识别返回路径的中继系统(ARS)
知识图(KG)可能
与实际证据相冲突,并将查询传递给适当的专业KS或数据库。
(2)已解决的问题:EHR和多组学来自大队列的原始数据,如果经过适当的前处理
[例如,由知识提供者,例如Dicket,见Glusman博士的申请],
为特别系统地提取关于关系的经验知识提供了新的机会
(“蛋白质P水平与D病风险相关”),而不是依赖于特定的流行病学
分析。利用原始数据代替演绎推理获得经验支持的问题
要评估的KG是从不同类型的知识来源中提取出来的,并且
路径的相关性取决于查询上下文Q。此外,ARA算法应该是可伸缩的
为了消化来自All-of-Us等项目的新兴多组学数据,英国生物库。
(3)实施计划:将开展研究,评估一个新的认知领域:
让经验证据成为“推理”的核心。我们已经组装了一套运行正常的工具来
克服启动项目的先入为主的问题,并启动开发和
EVIDARA测试:(I)Spoke,最大的生物医学知识网络(KN)之一,已整合
将25个不同的KS整合到一个包含200万个节点的(新4j)网络数据库中,并将为
作为研究的试验场,我们才能使用知识提供者生产的KGs。
(2)使用来自EHR和多组学研究的原始数据来评估归还的KG的算法。
例如,我们通过随机游走算法计算整个KN中所有节点的权重
对于原始数据中观察到的给定条件Q,它们的作用是有偏差的。(3)电子病历以外的原始数据:
来自ISB的一项研究的多组学概况,其10K变量远远超过了
EHR提供的KNS中的可观察节点。示例查询:“维生素K刺激干细胞信号,
因此可能会引发癌症。分子途径是什么?以KG形式返回的机构
将由EVIDARA进行修剪,并与原始数据中的相关证据进行核对:
有证据表明服用维他命。K或其拮抗剂可降低癌症风险吗?“重要的是,由于埃维达拉
在许多类型的KS的网络上学习,它将向ARS提供关于哪种类型的信息
要调用Next(在迭代查询中)以改进知识图谱的KS/知识提供者的。
(4)专业知识和资源:MPIS、S.Baranzini博士(加州大学旧金山分校)和S.Huang(ISB)为研究人员
在医疗大数据方面有着悠久的历史,因此提供技术专业知识和
批判性的中小企业观点。SB的团队已经创建并维护了发言。独一无二的自给自足
分支网络将允许NCATS工作人员测试其他ARA。承宪带来了数十年的经验
在疾病机制和医学认识论方面的研究。他的团队将提供多组学
数据集和数据分析专业知识。凭借之前在NCATS翻译计划中的工作,他是
做好充分准备,最大限度地提高团队科学效率,并帮助将其愿景转化为切实的成果。
(5)潜在挑战。(1)证据支持的质量取决于原始数据的质量。一种品质
控制超出了EVIDARA的范围,但可以由专注于
关于新的多组学数据集(例如DOCKET)。(Ii)从知识中测试其他KS上的EVIDARA
提供者)可能会因为互操作性问题(例如,不兼容的标识符)而变慢。是这样的
这些问题将在第一年初在标准和参考小组的帮助下得到解决。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
SERGIO E BARANZINI其他文献
SERGIO E BARANZINI的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('SERGIO E BARANZINI', 18)}}的其他基金
EVIDARA: Automated Evidential Support from Raw Data for relay agents in Biomedical KG Queries
EVIDARA:生物医学 KG 查询中中继代理的原始数据自动证据支持
- 批准号:
10330633 - 财政年份:2020
- 资助金额:
$ 89.6万 - 项目类别:
EVIDARA: Automated Evidential Support from Raw Data for relay agents in Biomedical KG Queries
EVIDARA:生物医学 KG 查询中中继代理的原始数据自动证据支持
- 批准号:
10547256 - 财政年份:2020
- 资助金额:
$ 89.6万 - 项目类别:
EVIDARA: Automated Evidential Support from Raw Data for relay agents in Biomedical KG Queries
EVIDARA:生物医学 KG 查询中中继代理的原始数据自动证据支持
- 批准号:
10706762 - 财政年份:2020
- 资助金额:
$ 89.6万 - 项目类别:
The genetic basis of progression in multiple sclerosis
多发性硬化症进展的遗传基础
- 批准号:
10084323 - 财政年份:2017
- 资助金额:
$ 89.6万 - 项目类别:
The genetic basis of progression in multiple sclerosis
多发性硬化症进展的遗传基础
- 批准号:
9737736 - 财政年份:2017
- 资助金额:
$ 89.6万 - 项目类别:
Post GWAS approach to identify cell-specific genetic pathways underlying MS risk
GWAS 后方法可识别 MS 风险背后的细胞特异性遗传途径
- 批准号:
8925166 - 财政年份:2014
- 资助金额:
$ 89.6万 - 项目类别:
Post GWAS approach to identify cell-specific genetic pathways underlying MS risk
GWAS 后方法可识别 MS 风险背后的细胞特异性遗传途径
- 批准号:
9116321 - 财政年份:2014
- 资助金额:
$ 89.6万 - 项目类别:
Post GWAS approach to identify cell-specific genetic pathways underlying MS risk
GWAS 后方法可识别 MS 风险背后的细胞特异性遗传途径
- 批准号:
9330939 - 财政年份:2014
- 资助金额:
$ 89.6万 - 项目类别:
相似海外基金
Big Data Analytics: Optimization Models and Algorithms with Applications in Smart Food Supply Chains and Networks
大数据分析:优化模型和算法在智能食品供应链和网络中的应用
- 批准号:
RGPIN-2020-06792 - 财政年份:2022
- 资助金额:
$ 89.6万 - 项目类别:
Discovery Grants Program - Individual
Large Systems and Big Data: Models, Tools, Analysis, and Algorithms
大型系统和大数据:模型、工具、分析和算法
- 批准号:
RGPIN-2020-04075 - 财政年份:2022
- 资助金额:
$ 89.6万 - 项目类别:
Discovery Grants Program - Individual
Algorithms and Tools for Big Data Analysis and Automated Real Time Optimal or Near Optimal Decision Making for Industrial Systems
用于工业系统大数据分析和自动实时最佳或接近最佳决策的算法和工具
- 批准号:
RGPIN-2017-05785 - 财政年份:2022
- 资助金额:
$ 89.6万 - 项目类别:
Discovery Grants Program - Individual
Novel Learning-Based Visual Algorithms and Fusion Methods for High-Dimensional/Multi-Modality Big Data
基于学习的新型高维/多模态大数据视觉算法和融合方法
- 批准号:
RGPIN-2022-02948 - 财政年份:2022
- 资助金额:
$ 89.6万 - 项目类别:
Discovery Grants Program - Individual
(Re)designing Clustering Algorithms for Big Data
(重新)设计大数据聚类算法
- 批准号:
RGPIN-2017-05617 - 财政年份:2022
- 资助金额:
$ 89.6万 - 项目类别:
Discovery Grants Program - Individual
NCS-FO: Connectome mapping algorithms with application to community services for big data neuroscience
NCS-FO:连接组映射算法及其应用于大数据神经科学社区服务
- 批准号:
2203524 - 财政年份:2021
- 资助金额:
$ 89.6万 - 项目类别:
Standard Grant
Big Data Analytics: Optimization Models and Algorithms with Applications in Smart Food Supply Chains and Networks
大数据分析:优化模型和算法在智能食品供应链和网络中的应用
- 批准号:
RGPIN-2020-06792 - 财政年份:2021
- 资助金额:
$ 89.6万 - 项目类别:
Discovery Grants Program - Individual
Exploring Novel Mathematical Models and Efficient Algorithms to Discover Periodic Spatial Patterns in Irregular Spatiotemporal Big Data
探索新颖的数学模型和高效算法以发现不规则时空大数据中的周期性空间模式
- 批准号:
21K12034 - 财政年份:2021
- 资助金额:
$ 89.6万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
A comprehensive study of big data clustering algorithms
大数据聚类算法综合研究
- 批准号:
571110-2018 - 财政年份:2021
- 资助金额:
$ 89.6万 - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Master's
(Re)designing Clustering Algorithms for Big Data
(重新)设计大数据聚类算法
- 批准号:
RGPIN-2017-05617 - 财政年份:2021
- 资助金额:
$ 89.6万 - 项目类别:
Discovery Grants Program - Individual














{{item.name}}会员




