The AnVIL Data Ecosystem
AnVIL 数据生态系统
基本信息
- 批准号:10231107
- 负责人:
- 金额:$ 450万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-09-19 至 2023-06-30
- 项目状态:已结题
- 来源:
- 关键词:AddressAdvisory CommitteesAll of Us Research ProgramArchitectureAtlasesCatalogsCellsChicagoCloud ComputingCollaborationsCommunitiesComputer softwareConsultationsCost ControlDataData EngineeringData ScienceData SetDocumentationEcosystemEducational workshopEnsureEnvironmentGenomic Data CommonsGenotypeGoalsHumanHuman GeneticsInstitutesLeadLicensingMetadataModelingNational Human Genome Research InstitutePhenotypePlayResearchResearch PersonnelResourcesRoleSecureSecurityServicesSoftware EngineeringStandardizationSystemUnited States National Institutes of HealthWashingtonWorkcloud basedcommunity engaged researchcommunity engagementdata accessdata ecosystemdata modelingdata resourcedata sharingdata standardsdata toolsdigitalexperiencefunctional genomicsgenome sciencesgenomic dataindexinginnovationinteroperabilitymembernovelnovel strategiesopen sourceoperationphenotypic datarepositorysoftware developmentsoftware systemstool
项目摘要
The AnVIL Data Ecosystem
Project Summary / Abstract
In this proposal, we bring together a unified team with a strong track record of developing secure and scalable
software systems to support flagship scientific efforts, such as the All of Us Research Program, the Genomic
Data Commons (GDC), and the Human Cell Atlas (HCA). Our group will leverage these experiences, and the
software developed for them, to create an ecosystem of applications that will both serve the needs of the
AnVIL and interoperate with other NIH data resources. We will accomplish this through the following Aims:
● Aim 1 (Software Engineering): Leverage existing software capabilities to create tools for storing,
sharing, and analyzing AnVIL datasets at unlimited scale. During the past five years, our groups
have created a suite of modular and open source software capabilities that address key needs in
genomic data science. We will leverage these existing capabilities and extend them in novel directions
to address AnVIL-specific scientific goals relating to human genetics and functional genomics.
● Aim 2 (Data Engineering): Curate data and metadata resources so that they are easily
accessible. The AnVIL will not only be a suite of software services, but also a vast repository of
genotypic and phenotypic information. For this resource to be usable by the community, it must be
organized, curated, and made accessible. We will accomplish this by processing genomic datasets
using a consistent set of best-practices pipelines, and mapping phenotypes to a common data model.
● Aim 3 (Operations): Stand up and support a data environment for the AnVIL community, and
integrate it with other NIH resources as part of a federated NIH-wide genomic data commons.
The modular components of Aim 1 are critical building blocks, but they alone are not enough to meet
the needs of the AnVIL; they must also be stood up as services and integrated into a coherent entity,
which we call a “data environment.” We propose to create an AnVIL data environment that will enable
researchers to access datasets in a secure, compliant, and facile manner.
The guiding principle of these efforts is that progress in genomic science will happen most rapidly if there is a
diversity of solutions created by a plurality of groups. Towards that end, our approach to engineering the
software components of Aim 1, curating the datasets of Aim 2, and operating the software services of Aim 3 is
to catalyze an ecosystem of activity around the AnVIL. Our proposal focuses not only on creating and
operating software services ourselves, but also on incorporating third-party solutions. We propose to
accomplish this by architecting the AnVIL data environment according to the following principles: (i) modularity,
(ii) openness, (iii) community engagement, (iv) standardization, and (v) interoperability.
砧数据生态系统
项目摘要 /摘要
在此提案中,我们将一个统一团队汇集在一起,具有良好的往绩,以发展安全可扩展
支持旗舰科学工作的软件系统,例如我们所有的研究计划,基因组
数据共享(GDC)和人类细胞地图集(HCA)。我们的小组将利用这些经验,以及
为他们开发的软件,以创建一个应用程序的生态系统,以满足
砧座并与其他NIH数据资源进行互操作。我们将通过以下目标实现这一目标:
●AIM 1(软件工程):利用现有的软件功能来创建存储工具,
分享并以无限量表分析砧数据集。在过去的五年中,我们的团体
已经创建了一套模块化和开源软件功能,以满足关键需求
基因组数据科学。我们将利用这些现有能力并将其扩展到新颖的方向
解决与人类遗传学和功能基因组学有关的砧座特异性科学目标。
●AIM 2(数据工程):策划数据和元数据资源,使它们很容易
可访问。砧不仅将是软件服务的套件,而且还将是大量的存储库
基因型和表型信息。为了使社区可用,必须是
组织,策划并使其访问。我们将通过处理基因组数据集来实现这一目标
使用一致的最佳实践管道以及将表型映射到通用数据模型。
●AIM 3(操作):站起来并支持砧座社区的数据环境,
将其与其他NIH资源集成在一起,作为联合NIH范围内基因组数据共享的一部分。
AIM 1的模块化组件是关键的构件,但仅它们就不足以满足
砧的需求;他们还必须作为服务停滞不前,并集成到一个连贯的实体中,
我们称之为“数据环境”。我们建议创建一个砧座数据环境,以启用
研究人员以安全,合规和轻松的方式访问数据集。
这些努力的指导原则是,如果有一个基因组科学的进步,如果有一个
多个群体创造的解决方案多样性。为此,我们对工程师的方法
AIM 1的软件组件,策划AIM 2的数据集并操作AIM 3的软件服务是
催化砧座周围的活动生态系统。我们的建议不仅侧重于创建和
我自己操作软件服务,但也可以在成立的第三方解决方案上进行操作。我们建议
通过根据以下原则构建Anvil数据环境来实现这一点:(i)模块化,
(ii)开放性,(iii)社区参与,(iv)标准化和(v)互操作性。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Robert J Carroll其他文献
Robert J Carroll的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Robert J Carroll', 18)}}的其他基金
AnVIL Clinical Environment for Innovation and Translation (ACE-IT)
AnVIL 创新与转化临床环境 (ACE-IT)
- 批准号:
10747551 - 财政年份:2023
- 资助金额:
$ 450万 - 项目类别:
Data Management and Portal for the INCLUDE (DAPI) Project
INCLUDE (DAPI) 项目的数据管理和门户
- 批准号:
10697338 - 财政年份:2020
- 资助金额:
$ 450万 - 项目类别:
Advancing Image Data Interoperability and Standards within an NIH Ecosystem (AIDISNE): A CHOP, FlyWheel, and Seven Bridges Integration Demonstration Project
推进 NIH 生态系统 (AIDISNE) 内的图像数据互操作性和标准:CHOP、FlyWheel 和七桥集成示范项目
- 批准号:
10690302 - 财政年份:2020
- 资助金额:
$ 450万 - 项目类别:
Data Management and Portal for the INCLUDE (DAPI) Project
INCLUDE (DAPI) 项目的数据管理和门户
- 批准号:
10264912 - 财政年份:2020
- 资助金额:
$ 450万 - 项目类别:
User-ready tools and scalable workflows for INCLUDE datasets in the cloud: advancing brain imaging data management and analytics
用于云中 INCLUDE 数据集的用户就绪工具和可扩展工作流程:推进脑成像数据管理和分析
- 批准号:
10406678 - 财政年份:2020
- 资助金额:
$ 450万 - 项目类别:
Data Management and Portal for the INCLUDE (DAPI) Project
INCLUDE (DAPI) 项目的数据管理和门户
- 批准号:
10472037 - 财政年份:2020
- 资助金额:
$ 450万 - 项目类别:
相似海外基金
All of Us Research Program Heartland Consortium (AoURP-HC)
我们所有人研究计划中心联盟 (AoURP-HC)
- 批准号:
10871732 - 财政年份:2023
- 资助金额:
$ 450万 - 项目类别:
Integrating genomic and nongenomic risk for coronary artery disease
整合冠状动脉疾病的基因组和非基因组风险
- 批准号:
10681391 - 财政年份:2022
- 资助金额:
$ 450万 - 项目类别:
California Partnership for Personalized Nutrition
加州个性化营养合作伙伴关系
- 批准号:
10669429 - 财政年份:2022
- 资助金额:
$ 450万 - 项目类别:
Integrating genomic and nongenomic risk for coronary artery disease
整合冠状动脉疾病的基因组和非基因组风险
- 批准号:
10524541 - 财政年份:2022
- 资助金额:
$ 450万 - 项目类别:
California Partnership for Personalized Nutrition
加州个性化营养合作伙伴关系
- 批准号:
10386527 - 财政年份:2021
- 资助金额:
$ 450万 - 项目类别: