DOCKET: accelerating knowledge extraction from biomedical data sets

DOCKET:加速从生物医学数据集中提取知识

基本信息

  • 批准号:
    10330627
  • 负责人:
  • 金额:
    $ 67.68万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-01-24 至 2022-01-23
  • 项目状态:
    已结题

项目摘要

Component type: This Knowledge Provider project will continue and significantly extend work done by the Translator Consortium Blue Team, focusing on deriving knowledge from real-world data through complex analytic workflows, integrated to the Translator Knowledge Graph, and served via tools like Big GIM and the Translator Standard API. The problem: We aim to solve the “first mile” problem of translational research: how to integrate the multitude of dynamic small-to-large data sets that have been produced by the research and clinical communities, but that are in different locations, processed in different ways, and in a variety of formats that may not be mutually interoperable. Integrating these data sets requires significant manual work downloading, reformatting, parsing, indexing and analyzing each data set in turn. The technical and ethical challenges of accessing diverse collections of big data, efficiently selecting information relevant to different users’ interests, and extracting the underlying knowledge are problems that remain unsolved. Here, we propose to leverage lessons distilled from our previous and ongoing big data analysis projects to develop a highly automated tool for removing these bottlenecks, enabling researchers to analyze and integrate many valuable data sets with ease and efficiency, and making the data FAIR [1]. Plan: (AIM 1) We will analyze and extract knowledge from rich real-world biomedical data sets (listed in the Resources page) in the domains of wellness, cancer, and large-scale clinical records. (AIM 2) We will formalize methods from Aim 1 to develop DOCKET, a novel tool for onboarding and integrating data from multiple domains. (AIM 3) We will work with other teams to adapt DOCKET to additional knowledge domains. ■ The DOCKET tool will offer 3 modules: (1) DOCKET Overview: Analysis of, and knowledge extraction from, an individual data set. (2) DOCKET Compare: Comparing versions of the same data set to compute confidence values, and comparing different data sets to find commonalities. (3) DOCKET Integrate: Deriving knowledge through integrating different data sets. ■ Researchers will be able to parameterize these functions, resolve inconsistencies, and derive knowledge through the command line, Jupyter notebooks, or other interfaces as specified by Translator Standards. ■ The outcome will be a collection of nodes and edges, richly annotated with context, provenance and confidence levels, ready for incorporation into the Translator Knowledge Graph (TKG). ■ All analyses and derived knowledge will be stored in standardized formats, enabling querying through the Reasoner Std API and ingestion into downstream AI assisted machine learning. ■ Example questions this will allow us to address include: (Wellness) Which clinical analytes, metabolites, proteins, microbiome taxa, etc. are significantly correlated, and which changing analytes predict transition to which disease? [2,3] (Cancer) Which gene mutations in any of X pathways are associated with sensitivity or resistance to any of Y drugs, in cell lines from Z tumor types? (All data sets) Which data set entities are similar to this one? Are there significant clusters? What distinguishes between the clusters? What significant correlations of attributes can be observed? How can this set of entities be expanded by adding similar ones? How do these N versions of this data set differ, and how stable is each knowledge edge as the data set changes over time? Collaboration strengths: Our team has extensive experience with biomedical and domain-agnostic data analytics, integrating multiple relevant data types: omics, clinical measurements and electronic health records (EHRs). We have participated in large collaborative consortia and have subject matter experts willing to advise on proper data interpretation. Our application synergizes with those of other Translator teams (see Letters of Collaboration). Challenges: Data can come in a bewildering diversity of formats. Our solution will be modular, will address the most common formats first, and will leverage established technologies like DataFrames and importers (like pandas.io) where possible. Mapping nodes and edge types onto standard ontologies is crucial for knowledge integration; we will collaborate with the Standards component to maximize success.
组件类型:该知识提供者项目将继续并显著扩展工作 由翻译联盟蓝色团队完成,专注于从现实世界中获取知识 通过复杂的分析工作流获取数据,并集成到翻译知识图中 通过Big Gim和Translator Standard API等工具提供服务。 问题:我们的目标是解决翻译研究的第一英里问题:如何 集成从小到大的大量动态数据集,这些数据集由 研究和临床社区,但在不同的地点,在不同的过程中 方式,以及可能不能相互互操作的各种格式。集成这些数据 集合需要大量的手动工作来下载、重新格式化、解析、索引和 依次对每个数据集进行分析。获取各种资源的技术和道德挑战 收集大数据,高效地选择与不同用户兴趣相关的信息,以及 提取潜在的知识是仍然没有解决的问题。在此,我们建议 利用从我们以前和正在进行的大数据分析项目中提取的经验教训,开发 用于消除这些瓶颈的高度自动化工具,使研究人员能够分析和 轻松高效地集成许多有价值的数据集,并使数据公平[1]。 计划:(目标1)我们将从丰富的真实生物医学数据集中分析和提取知识 (在参考资料页面中列出)在健康、癌症和大规模临床领域 唱片。(目标2)我们将形式化目标1中的方法,以开发DOCKET,这是一种用于 自注册和集成来自多个域的数据。(目标3)我们将与其他团队合作 使摘要适应更多的知识领域。摘要工具将提供3个模块: (1)摘要概述:对单个数据集的分析和知识提取。(2) 摘要比较:比较相同数据集的版本以计算置信值, 以及比较不同的数据集以找到共性。(3)摘要集成:派生 通过整合不同的数据集来获取知识。研究人员将能够将参数化为 这些功能,解决不一致,并通过命令行获取知识, Jupyter笔记本或翻译器标准指定的其他接口。结果将是 是结点和边的集合,用上下文、来源和置信度进行丰富的注释 级别,可随时纳入翻译员知识图谱(TKG)。所有分析和 派生的知识将以标准化格式存储,从而能够通过 推理者STD API和摄取下游人工智能辅助机器学习。示例 这将使我们能够解决的问题包括:(健康)哪些临床分析物、代谢物、 蛋白质、微生物类群等是显著相关的,而变化的分析物预测 过渡到哪种疾病?[2,3](癌症)X通路中的哪些基因突变是 在Z肿瘤类型的细胞系中,与对任何Y药物的敏感性或耐药性有关?(全部 数据集)哪些数据集实体与该数据集实体相似?有没有重要的集群?什么 区分不同的星团吗?可以观察到属性之间有哪些显著的相关性? 如何通过添加类似的实体来扩展这组实体?这N个版本是如何 这个数据集是不同的,随着数据集随着时间的推移而变化,每个知识边缘的稳定性如何? 协作优势:我们的团队在生物医学和领域不可知方面拥有丰富的经验 数据分析,集成多种相关数据类型:组学、临床测量 和电子健康记录(EHR)。我们参与了大型合作财团和 让主题专家愿意就适当的数据解释提供建议。我们的应用程序 与其他翻译团队的翻译团队协同工作(请参阅合作信函)。 挑战:数据可能会以令人眼花缭乱的多样性格式出现。我们的解决方案将是模块化的, 将首先解决最常见的格式,并将利用成熟的技术,如 在可能的情况下,DataFrame和导入器(如panas.io)。映射节点和边类型 标准本体论是知识集成的关键;我们将与 标准组件,以最大限度地提高成功。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Gwênlyn Glusman其他文献

Gwênlyn Glusman的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Gwênlyn Glusman', 18)}}的其他基金

DOCKET: accelerating knowledge extraction from biomedical data sets
DOCKET:加速从生物医学数据集中提取知识
  • 批准号:
    10057127
  • 财政年份:
    2020
  • 资助金额:
    $ 67.68万
  • 项目类别:
DOCKET: accelerating knowledge extraction from biomedical data sets
DOCKET:加速从生物医学数据集中提取知识
  • 批准号:
    10548024
  • 财政年份:
    2020
  • 资助金额:
    $ 67.68万
  • 项目类别:
DOCKET: accelerating knowledge extraction from biomedical data sets
DOCKET:加速从生物医学数据集中提取知识
  • 批准号:
    10706750
  • 财政年份:
    2020
  • 资助金额:
    $ 67.68万
  • 项目类别:
Biomedical Data Translator Technical Feasibility Assessment and Architecture Design
生物医学数据转换器技术可行性评估和架构设计
  • 批准号:
    9338977
  • 财政年份:
    2016
  • 资助金额:
    $ 67.68万
  • 项目类别:
Biomedical Data Translator Technical Feasibility Assessment and Architecture Design
生物医学数据转换器技术可行性评估和架构设计
  • 批准号:
    9486059
  • 财政年份:
    2016
  • 资助金额:
    $ 67.68万
  • 项目类别:

相似国自然基金

Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    合作创新研究团队

相似海外基金

Conference: Theory and Foundations of Statistics in the Era of Big Data
会议:大数据时代的统计学理论与基础
  • 批准号:
    2403813
  • 财政年份:
    2024
  • 资助金额:
    $ 67.68万
  • 项目类别:
    Standard Grant
FightAMR: Novel global One Health surveillance approach to fight AMR using Artificial Intelligence and big data mining
FightAMR:利用人工智能和大数据挖掘对抗 AMR 的新型全球统一健康监测方法
  • 批准号:
    MR/Y034422/1
  • 财政年份:
    2024
  • 资助金额:
    $ 67.68万
  • 项目类别:
    Research Grant
Exploring Hotel Customer Experiences in Japan via Big Data and Large Language Model Analysis
通过大数据和大语言模型分析探索日本酒店客户体验
  • 批准号:
    24K21025
  • 财政年份:
    2024
  • 资助金额:
    $ 67.68万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
CC* Networking Infrastructure: Enabling Big Science and Big Data Projects at the University of Massachusetts
CC* 网络基础设施:支持马萨诸塞大学的大科学和大数据项目
  • 批准号:
    2346286
  • 财政年份:
    2024
  • 资助金额:
    $ 67.68万
  • 项目类别:
    Standard Grant
Big Data-based Distributed Control using a Behavioural Systems Framework
使用行为系统框架的基于大数据的分布式控制
  • 批准号:
    DP240100300
  • 财政年份:
    2024
  • 资助金额:
    $ 67.68万
  • 项目类别:
    Discovery Projects
REU Site: Online Interdisciplinary Big Data Analytics in Science and Engineering
REU 网站:科学与工程领域的在线跨学科大数据分析
  • 批准号:
    2348755
  • 财政年份:
    2024
  • 资助金额:
    $ 67.68万
  • 项目类别:
    Standard Grant
Market Orientation, Big Data Analysis Capability, and Business Performance: The Moderating Role of Supplier Relationship, Big data Analysis Outscoring
市场导向、大数据分析能力与经营绩效:供应商关系的调节作用、大数据分析得分
  • 批准号:
    24K05127
  • 财政年份:
    2024
  • 资助金额:
    $ 67.68万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Generative Visual Pre-training on Unlabelled Big Data
未标记大数据的生成视觉预训练
  • 批准号:
    DP240101848
  • 财政年份:
    2024
  • 资助金额:
    $ 67.68万
  • 项目类别:
    Discovery Projects
MEGASKILLS [MEthodology of Psycho-pedagogical, Big Data and Commercial Video GAmes procedures for the European SKILLS Agenda Implementation]
MEGASKILLS [欧洲技能议程实施的心理教育学、大数据和商业视频游戏程序的方法]
  • 批准号:
    10069843
  • 财政年份:
    2023
  • 资助金额:
    $ 67.68万
  • 项目类别:
    EU-Funded
Improving NHS perimenopausal diagnosis and HRT prescription through AI, machine learning and big data
通过人工智能、机器学习和大数据改善 NHS 围绝经期诊断和 HRT 处方
  • 批准号:
    10053966
  • 财政年份:
    2023
  • 资助金额:
    $ 67.68万
  • 项目类别:
    Collaborative R&D
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了