The AnVIL Data Ecosystem
AnVIL 数据生态系统
基本信息
- 批准号:9788512
- 负责人:
- 金额:$ 495.45万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-09-19 至 2023-06-30
- 项目状态:已结题
- 来源:
- 关键词:AddressAdvisory CommitteesAll of Us Research ProgramArchitectureAtlasesCatalogsCellsChicagoCloud ComputingCollaborationsCommunitiesComputer softwareConsultationsCost ControlDataData EngineeringData ScienceData SetDocumentationEcosystemEducational workshopEnsureEnvironmentGenomic Data CommonsGenotypeGoalsHumanHuman GeneticsInstitutesLeadLicensingMetadataModelingNational Human Genome Research InstitutePhenotypePlayResearchResearch PersonnelResourcesRoleSecureSecurityServicesSoftware EngineeringStandardizationSystemUnited States National Institutes of HealthWashingtonWorkcloud baseddata accessdata modelingdata resourcedata sharingdigitalexperiencefunctional genomicsgenome sciencesgenomic dataindexinginnovationinteroperabilitymembernovelnovel strategiesopen sourceoperationphenotypic datarepositorysoftware developmentsoftware systemstool
项目摘要
The AnVIL Data Ecosystem
Project Summary / Abstract
In this proposal, we bring together a unified team with a strong track record of developing secure and scalable
software systems to support flagship scientific efforts, such as the All of Us Research Program, the Genomic
Data Commons (GDC), and the Human Cell Atlas (HCA). Our group will leverage these experiences, and the
software developed for them, to create an ecosystem of applications that will both serve the needs of the
AnVIL and interoperate with other NIH data resources. We will accomplish this through the following Aims:
● Aim 1 (Software Engineering): Leverage existing software capabilities to create tools for storing,
sharing, and analyzing AnVIL datasets at unlimited scale. During the past five years, our groups
have created a suite of modular and open source software capabilities that address key needs in
genomic data science. We will leverage these existing capabilities and extend them in novel directions
to address AnVIL-specific scientific goals relating to human genetics and functional genomics.
● Aim 2 (Data Engineering): Curate data and metadata resources so that they are easily
accessible. The AnVIL will not only be a suite of software services, but also a vast repository of
genotypic and phenotypic information. For this resource to be usable by the community, it must be
organized, curated, and made accessible. We will accomplish this by processing genomic datasets
using a consistent set of best-practices pipelines, and mapping phenotypes to a common data model.
● Aim 3 (Operations): Stand up and support a data environment for the AnVIL community, and
integrate it with other NIH resources as part of a federated NIH-wide genomic data commons.
The modular components of Aim 1 are critical building blocks, but they alone are not enough to meet
the needs of the AnVIL; they must also be stood up as services and integrated into a coherent entity,
which we call a “data environment.” We propose to create an AnVIL data environment that will enable
researchers to access datasets in a secure, compliant, and facile manner.
The guiding principle of these efforts is that progress in genomic science will happen most rapidly if there is a
diversity of solutions created by a plurality of groups. Towards that end, our approach to engineering the
software components of Aim 1, curating the datasets of Aim 2, and operating the software services of Aim 3 is
to catalyze an ecosystem of activity around the AnVIL. Our proposal focuses not only on creating and
operating software services ourselves, but also on incorporating third-party solutions. We propose to
accomplish this by architecting the AnVIL data environment according to the following principles: (i) modularity,
(ii) openness, (iii) community engagement, (iv) standardization, and (v) interoperability.
AnVIL数据生态系统
项目总结/摘要
在本提案中,我们组建了一个统一的团队,该团队在开发安全和可扩展的
软件系统,以支持旗舰科学工作,如我们所有的研究计划,基因组
数据共享(GDC)和人类细胞图谱(HCA)。我们的团队将利用这些经验,
为他们开发的软件,以创建一个应用程序生态系统,既满足
AnVIL并与其他NIH数据资源互操作。我们将通过以下目标实现这一目标:
目标1(软件工程):利用现有的软件功能来创建存储工具,
共享和分析AnVIL数据集。在过去的五年里,我们的团队
创建了一套模块化和开源软件功能,以满足
基因组数据科学我们将利用这些现有的能力,并将其扩展到新的方向
解决与人类遗传学和功能基因组学相关的AnVIL特定科学目标。
目标2(数据工程):管理数据和元数据资源,使其易于
容易接近AnVIL不仅是一套软件服务,而且是一个巨大的存储库,
基因型和表型信息。要使社区能够使用此资源,必须
组织,策划,并使其可访问。我们将通过处理基因组数据集来实现这一点
使用一组一致的最佳实践管道,并将表型映射到公共数据模型。
目标3(运营):站起来,为AnVIL社区提供数据环境,
将其与其他NIH资源整合,作为联邦NIH范围内基因组数据共享的一部分。
Aim 1的模块化组件是关键的构建块,但仅凭它们还不足以满足
AnVIL的需求;它们也必须作为服务站起来,并整合到一个连贯的实体中,
我们称之为“数据环境”我们建议创建一个AnVIL数据环境,
研究人员以安全、合规和简便的方式访问数据集。
这些努力的指导原则是,如果有一个合适的基因,基因组科学的进步将发生得最快。
由多个团体创造的解决方案的多样性。为此,我们的工程方法,
目标1的软件组件,目标2的数据集管理,目标3的软件服务操作,
来催化围绕AnVIL的活动生态系统。我们的建议不仅注重创造,
我们不仅要自己操作软件服务,还要整合第三方解决方案。我们建议
通过根据以下原则构建AnVIL数据环境来实现这一点:(i)模块化,
(ii)开放性,(iii)社区参与,(iv)标准化,(v)互操作性。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Robert J Carroll其他文献
Robert J Carroll的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Robert J Carroll', 18)}}的其他基金
AnVIL Clinical Environment for Innovation and Translation (ACE-IT)
AnVIL 创新与转化临床环境 (ACE-IT)
- 批准号:
10747551 - 财政年份:2023
- 资助金额:
$ 495.45万 - 项目类别:
Data Management and Portal for the INCLUDE (DAPI) Project
INCLUDE (DAPI) 项目的数据管理和门户
- 批准号:
10697338 - 财政年份:2020
- 资助金额:
$ 495.45万 - 项目类别:
Data Management and Portal for the INCLUDE (DAPI) Project
INCLUDE (DAPI) 项目的数据管理和门户
- 批准号:
10264912 - 财政年份:2020
- 资助金额:
$ 495.45万 - 项目类别:
Advancing Image Data Interoperability and Standards within an NIH Ecosystem (AIDISNE): A CHOP, FlyWheel, and Seven Bridges Integration Demonstration Project
推进 NIH 生态系统 (AIDISNE) 内的图像数据互操作性和标准:CHOP、FlyWheel 和七桥集成示范项目
- 批准号:
10690302 - 财政年份:2020
- 资助金额:
$ 495.45万 - 项目类别:
User-ready tools and scalable workflows for INCLUDE datasets in the cloud: advancing brain imaging data management and analytics
用于云中 INCLUDE 数据集的用户就绪工具和可扩展工作流程:推进脑成像数据管理和分析
- 批准号:
10406678 - 财政年份:2020
- 资助金额:
$ 495.45万 - 项目类别:
Data Management and Portal for the INCLUDE (DAPI) Project
INCLUDE (DAPI) 项目的数据管理和门户
- 批准号:
10472037 - 财政年份:2020
- 资助金额:
$ 495.45万 - 项目类别:
相似海外基金
Toward a Political Theory of Bioethics: Participation, Representation, and Deliberation on Federal Bioethics Advisory Committees
迈向生命伦理学的政治理论:联邦生命伦理学咨询委员会的参与、代表和审议
- 批准号:
0451289 - 财政年份:2005
- 资助金额:
$ 495.45万 - 项目类别:
Standard Grant