Data Science Core
数据科学核心
基本信息
- 批准号:10241479
- 负责人:
- 金额:$ 45.29万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-09-25 至 2022-08-31
- 项目状态:已结题
- 来源:
- 关键词:AddressAlgorithmic SoftwareAlgorithmsArchitectureAtlasesAwardBackBehaviorBehavioralBig DataBiological ModelsBiologyBrainCodeComputer softwareDataData AnalyticsData CollectionData Management ResourcesData ScienceData Science CoreData SetData SourcesDatabase Management SystemsDecision MakingEcosystemEnsureEnvironmentExperimental DesignsFAIR principlesFaceFeedbackFoundationsFundingGoalsGraphImageIndividualInfrastructureIntakeInternationalLeadLibrariesLicensingLinkMachine LearningMathematicsMeasuresMemoryMetadataModalityModelingNeurosciencesPhylogenyPlantsPrivatizationProcessReproducibilityResearchResourcesRunningScienceServicesSmall Business Innovation Research GrantSpecific qualifier valueSurveysSystemTechnologyTestingUnited States National Institutes of HealthVisualizationZebrafishanalysis pipelineapplication programming interfacebasecloud basedcomputerized data processingconnectomedata analysis pipelinedata managementdata qualitydata repositorydesigndigitaldigital object identifierexperimental studygenomic datalaptopmembernovelopen sourceportabilitystatisticsterabyte
项目摘要
Data Science Core-Abstract
Achieving the scientific goals of the Overall Research Strategy requires a significant effort and advancement in data
science for neuroscience. In particular, scientific progress depends on novel experimental design, data collection and
processing (as described in Projects 1 and 2), and novel analysis and models (as described in Project 3), which lead to
general principles to be tested (as described in Project 4). The fundamental goal of the Data Science Core is to accelerate
the process connecting the raw data collected in Projects 1, 2, and 4 to the analyses used to obtain data derivatives, which
can then be used to build models in Project 3, and extend them in Project 4. The two main challenges we face to accelerate
these links are big data and reproducibility. First, the data collected are too large to fit into memory, or even on disk, with
each experiment ordering on one terabyte (TB), and the entire dataset amassing hundreds of TB or more. Therefore, the
classic paradigm of using MATLAB for all analyses that are stored locally is not sufficient. The solution to this is
twofold: (1) build a cloud data management system, so that all consortium members can quickly access and analyze the
data, and (2) build scalable algorithms, so that different individuals can apply them to these big data. The cloud data
management system will be built on the infrastructure developed for the Open Connectome Project 1 , originally developed
to host data on institutional resources. In the last year, the team has matured to become NeuroData (http://neurodata.io),
porting all the infrastructure to the commercial cloud, and already hosting 20+ datasets comprising 50+ TB, including all
three scales of analysis proposed here (h ttp://neurodata.io). The scalable algorithms will be based on another project from
NeuroData called FlashX (http://flashx.io). FlashX is a C++ graph analytics and machine learning library, designed to run
analytics on arbitrarily large data using only a single machine (not a cluster) 2 ,3, and the recent recipient of a DARPA
SBIR award to commercialize. We will use FlashX as a backend to support all the algorithms for processing behavior and
imaging data. Second, this is a team effort, so sharing analyses and derivatives and keeping track of metadata will be
important. The solution to this is to build a comprehensive scientific environment in the cloud, that enables sharing of
entire “digital experiments”, linking to the data and ensuring that the entire analysis pipeline can be trivially run and
extended by anyone and anywhere. This system will extend NeuroData’s “Science in the Cloud”
(http://scienceinthe.cloud) 4 ,5, which recently received private funding to professionalize. Our entire system is built on and
will continue to be open source, portable and reproducible, and will use and extend best practices of data science and
FAIR ( data management. Completing all the aims in this Data Science
Findable, Accessible, Interoperable, and Re-usable)
Core will not only enable and accelerate the scientific progress addressed by this proposal, it will establish new standards
in data science that can be immediately applied to all other U19 efforts, as many other efforts within and outside NIH and
even the international science effort at large.
数据科学核心提取
实现整体研究策略的科学目标需要大量的努力和数据进步
神经科学科学。特别是,科学进步取决于新颖的实验设计,数据收集和
处理(如项目1和2中所述)以及新颖的分析和模型(如项目3所述),这导致
要测试的一般原则(如项目4中所述)。数据科学核心的基本目标是加速
将项目1、2和4中收集的原始数据连接到用于获取数据衍生物的分析的过程,
然后可以在项目3中用于构建模型,并将其扩展在项目4中。我们面临的两个主要挑战是加速
这些链接是大数据和可重复性。首先,收集的数据太大,无法适应内存,甚至在磁盘上,
每个实验在一个TBYTE(TB)上排序,整个数据集都积聚了数百个TB或更多的TB。因此,
使用MATLAB进行本地存储的所有分析的经典范式是不够的。解决方案是
双重:(1)构建一个云数据管理系统,以便所有财团成员都可以快速访问和分析
数据和(2)构建可扩展算法,以便不同的人可以将它们应用于这些大数据。云数据
管理系统将建立在最初开发的开放连接项目1开发的基础架构上
托管机构资源的数据。在去年,团队已经成熟成为Neurodata(http://neurodata.io),
将所有基础架构移植到商业云中,并且已经托管了20个以上的数据集,完成了50多个TB,包括全部
这里提出的三个分析量表(H ttp://neurodata.io)。可扩展算法将基于另一个项目
Neurodata称为FlashX(http://flashx.io)。 FlashX是C ++图形分析和机器学习库,旨在运行
仅使用一台计算机(而非群集)2,3的任意大数据分析,而最新的DARPA接收者
SBIR商业化奖。我们将使用FlashX作为后端来支持所有用于处理行为的算法和
成像数据。其次,这是团队的努力,因此分享分析和衍生产品,并跟踪元数据将是
重要的。解决方案是在云中建立一个全面的科学环境,使得
整个“数字实验”,链接到数据并确保整个分析管道可以琐碎地运行,并且
由任何人和任何地方扩展。该系统将扩展Neurodata的“云科学”
(http://scienceinthe.cloud)4,5,最近获得了专业化的私人资金。我们的整个系统建立在基础上,
将继续是开源的,便携式和可重复的,并将使用并扩展数据科学的最佳实践和
公平(数据管理。完成此数据科学的所有目标
可找到,可访问,可互操作和可重复使用)
核心不仅可以实现和加速该提议所解决的科学进步,还将建立新的标准
在数据科学中,可以立即应用于所有其他U19努力,以及NIH内部和外部的许多其他努力
甚至是整个国际科学努力。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
JOSHUA T VOGELSTEIN其他文献
JOSHUA T VOGELSTEIN的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('JOSHUA T VOGELSTEIN', 18)}}的其他基金
相似国自然基金
高吞吐低时延的多元LDPC码译码算法及其软件架构研究
- 批准号:62301029
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
深度学习中的流形优化问题:算法设计与求解软件包的开发
- 批准号:12301408
- 批准年份:2023
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
能量一阶导数的GPU算法和异构并行计算:WESP软件的发展和向国产异构平台的移植
- 批准号:22373112
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
机理与数据耦合驱动的AI赋能工业软件理论与算法
- 批准号:52335001
- 批准年份:2023
- 资助金额:230 万元
- 项目类别:重点项目
面向量子模拟算法的量子软件优化技术研究
- 批准号:62302395
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Brain Digital Slide Archive: An Open Source Platform for data sharing and analysis of digital neuropathology
Brain Digital Slide Archive:数字神经病理学数据共享和分析的开源平台
- 批准号:
10735564 - 财政年份:2023
- 资助金额:
$ 45.29万 - 项目类别:
An acquisition and analysis pipeline for integrating MRI and neuropathology in TBI-related dementia and VCID
用于将 MRI 和神经病理学整合到 TBI 相关痴呆和 VCID 中的采集和分析流程
- 批准号:
10810913 - 财政年份:2023
- 资助金额:
$ 45.29万 - 项目类别:
Wearable Wireless Respiratory Monitoring System that Detects and Predicts Opioid Induced Respiratory Depression
可穿戴无线呼吸监测系统,可检测和预测阿片类药物引起的呼吸抑制
- 批准号:
10784983 - 财政年份:2023
- 资助金额:
$ 45.29万 - 项目类别:
Leveraging artificial intelligence/machine learning-based technology to overcome specialized training and technology barriers for the diagnosis and prognostication of colorectal cancer in Africa
利用基于人工智能/机器学习的技术克服非洲结直肠癌诊断和预测的专业培训和技术障碍
- 批准号:
10712793 - 财政年份:2023
- 资助金额:
$ 45.29万 - 项目类别:
A visualization interface for BRAIN single cell data, integrating transcriptomics, epigenomics and spatial assays
BRAIN 单细胞数据的可视化界面,集成转录组学、表观基因组学和空间分析
- 批准号:
10643313 - 财政年份:2023
- 资助金额:
$ 45.29万 - 项目类别: