A modular data analysis ecosystem using portable encapsulated projects
使用便携式封装项目的模块化数据分析生态系统
基本信息
- 批准号:9751344
- 负责人:
- 金额:$ 39.32万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-08-01 至 2023-07-31
- 项目状态:已结题
- 来源:
- 关键词:AdoptedBioinformaticsBiomedical ResearchComplexDataData AnalysesData CollectionData SetEcosystemEncapsulatedEnvironmentGoalsHumanIndividualInheritedInstitutionKnowledgeLinkManualsProcessProviderPythonsResearchResearch Project GrantsRunningSeriesStandardizationStructureSurfaceSystemTechniquesbasebioinformatics toolcluster computingcomputerized data processingdata managementdata sharinginnovationinsightinterestnext generationnovelnovel strategiesportabilitytooltool development
项目摘要
Project summary
Overview
As the amount of available data increases, it becomes more challenging to process it. Data processing is simple on the
surface: it is a mapping from data to analysis. Unfortunately, too often, this requires a unique structure for each combination
of dataset and analysis. This makes it difficult to do things like run several different analyses on one dataset, or plug several
different datasets to one analysis, because each connection structure must be defined manually.
To alleviate this challenge of linking data to tools, this proposal develops the concept of Portable Encapsulated Projects
(PEP) and a series of tools that read and process such projects. Essentially, the PEP format aims to standardize the
description of data collections, enabling both data providers and data users to communicate through the common interface
of a standard format. Practically, this means individuals who describe their projects using this format will immediately inherit
both greater portability for analysis as well as greater access to external complementary data. This link operates around a
simple, standard, extensible definition of a project.
Accompanying this, this proposal develops Python and R packages to provide a modular framework with a low barrier to
entry that makes it easy to build robust pipelines and other tools centered around the PEP format. This system presents a
new approach to organizing data-intensive biomedical research projects.
Significance and innovation
This proposal sits at the interface of data management and bioinformatics tool development. While significant effort is
already dedicated to each of these individually, there has been less focus at the level of connecting the two. This proposal
will build a standardized interface between data and tools in bioinformatics, providing practical advances in formats and tools
to facilitate this interaction. This effort approaches computational projects in a novel way, and builds both concepts and tools
that can revolutionize bioinformatics research. The goal is not to develop new tools, but to make existing tools more easily
applied to existing data.
In computational research, a huge amount of effort is spent in data cleanup: preparing data for analysis. By facilitating the
connection from data to tools, this will encourage re-analysis of existing data with novel analysis techniques, leading to new
discovery. It will also make it easier to analyze new data in tandem with existing data, increasing the value of both. It will
contribute to reusability, larger-scale analysis, portable computing environments, and data sharing.
There is increasing interest in data sharing and accessibility across scientific domains, and this proposal will facilitate this.
Early versions are already adopted for both local compute and cluster computing at four different research institutions, and
as the project matures, it will unite various research environments around a common data description. This will make it
easier to share data and tools across users, research groups, and institutions.
1
项目总结
概述
随着可用数据量的增加,处理这些数据变得更具挑战性。的数据处理非常简单。
表面:它是从数据到分析的映射。不幸的是,这往往需要每种组合都有一个独特的结构
数据集和分析。这使得fi很难做一些事情,比如对一个数据集运行几个不同的分析,或者插入几个
不同的数据集到一个分析,因为每个连接结构都必须手动删除fiNed。
为了缓解将数据链接到工具的挑战,本提案提出了可移植封装项目的概念
(PEP)和一系列阅读和处理此类项目的工具。从本质上讲,PEP格式旨在标准化
数据收集说明,使数据提供者和数据使用者都能通过公共接口进行通信
一种标准格式。实际上,这意味着使用此格式描述其项目的个人将立即继承
既有更大的分析便携性,也有更多机会获得外部补充数据。此链接围绕一个
简单、标准、可扩展的项目定义(defi)。
与此相伴随的是,该提案开发了Python和R包,以提供一个具有低障碍的模块化框架
条目,使得以PEP格式为中心构建健壮的管道和其他工具变得容易。该系统提供了一种
组织数据密集型生物医学研究项目的新方法。
SignifiCance与创新
该提案涉及数据管理和生物信息学工具开发的接口。虽然Signifi不能努力
已经分别致力于其中的每一个,但在将两者联系起来的层面上,人们关注的较少。这项建议
将在生物信息学中建立数据和工具之间的标准化接口,提供格式和工具方面的实用进步
来促进这种互动。这项工作以一种新颖的方式处理计算项目,并构建概念和工具
这将给生物信息学研究带来革命性变化。目标不是开发新的工具,而是使现有的工具更容易
应用于现有数据。
在计算研究中,在数据清理上花费了大量的精力:为分析准备数据。通过促进
从数据到工具的连接,这将鼓励使用新的分析技术重新分析现有数据,从而产生新的
发现号。它还将使分析新数据和现有数据变得更容易,从而增加两者的价值。会的
有助于可重用性、更大规模的分析、便携式计算环境和数据共享。
人们对科学fic域之间的数据共享和可访问性越来越感兴趣,这项提议将促进这一点。
早期版本已经在四个不同的研究机构用于本地计算和集群计算,并且
随着项目的成熟,它将围绕一个通用的数据描述来统一各种研究环境。这会让它
更轻松地在用户、研究团队和机构之间共享数据和工具。
1
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Nathan Sheffield其他文献
Nathan Sheffield的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Nathan Sheffield', 18)}}的其他基金
Novel methods for large-scale genomic interval comparison
大规模基因组区间比较的新方法
- 批准号:
10678947 - 财政年份:2022
- 资助金额:
$ 39.32万 - 项目类别:
Novel methods for large-scale genomic interval comparison
大规模基因组区间比较的新方法
- 批准号:
10842040 - 财政年份:2022
- 资助金额:
$ 39.32万 - 项目类别:
A modular data analysis ecosystem using portable encapsulated projects
使用便携式封装项目的模块化数据分析生态系统
- 批准号:
10468680 - 财政年份:2018
- 资助金额:
$ 39.32万 - 项目类别:
A modular data analysis ecosystem using portable encapsulated projects
使用便携式封装项目的模块化数据分析生态系统
- 批准号:
10019399 - 财政年份:2018
- 资助金额:
$ 39.32万 - 项目类别:
A modular data analysis ecosystem using portable encapsulated projects
使用便携式封装项目的模块化数据分析生态系统
- 批准号:
10224819 - 财政年份:2018
- 资助金额:
$ 39.32万 - 项目类别:
相似海外基金
Conference: Global Bioinformatics Education Summit 2024 — Energizing Communities to Power the Bioeconomy Workforce
会议:2024 年全球生物信息学教育峰会 — 激励社区为生物经济劳动力提供动力
- 批准号:
2421267 - 财政年份:2024
- 资助金额:
$ 39.32万 - 项目类别:
Standard Grant
Open Access Block Award 2024 - EMBL - European Bioinformatics Institute
2024 年开放获取区块奖 - EMBL - 欧洲生物信息学研究所
- 批准号:
EP/Z532678/1 - 财政年份:2024
- 资助金额:
$ 39.32万 - 项目类别:
Research Grant
Conference: The 9th Workshop on Biostatistics and Bioinformatics
会议:第九届生物统计与生物信息学研讨会
- 批准号:
2409876 - 财政年份:2024
- 资助金额:
$ 39.32万 - 项目类别:
Standard Grant
PDB Management by The Research Collaboratory for Structural Bioinformatics
结构生物信息学研究合作实验室的 PDB 管理
- 批准号:
2321666 - 财政年份:2024
- 资助金额:
$ 39.32万 - 项目类别:
Cooperative Agreement
PAML 5: A friendly and powerful bioinformatics resource for phylogenomics
PAML 5:用于系统基因组学的友好且强大的生物信息学资源
- 批准号:
BB/X018571/1 - 财政年份:2024
- 资助金额:
$ 39.32万 - 项目类别:
Research Grant
Building a Bioinformatics Ecosystem for Agri-Ecologists
为农业生态学家构建生物信息学生态系统
- 批准号:
BB/X018768/1 - 财政年份:2023
- 资助金额:
$ 39.32万 - 项目类别:
Research Grant
Integrative viral genomics and bioinformatics platform
综合病毒基因组学和生物信息学平台
- 批准号:
MC_UU_00034/5 - 财政年份:2023
- 资助金额:
$ 39.32万 - 项目类别:
Intramural
Collaborative Research: IIBR: Innovation: Bioinformatics: Linking Chemical and Biological Space: Deep Learning and Experimentation for Property-Controlled Molecule Generation
合作研究:IIBR:创新:生物信息学:连接化学和生物空间:属性控制分子生成的深度学习和实验
- 批准号:
2318829 - 财政年份:2023
- 资助金额:
$ 39.32万 - 项目类别:
Continuing Grant
Planning Proposal: CREST Center in Bioinformatics
规划方案:CREST生物信息学中心
- 批准号:
2334642 - 财政年份:2023
- 资助金额:
$ 39.32万 - 项目类别:
Standard Grant














{{item.name}}会员




