Collaborative Proposal: EarthCube Integration: Pangeo: An Open Source Big Data Climate Science Platform
合作提案:EarthCube 集成:Pangeo:开源大数据气候科学平台
基本信息
- 批准号:1740648
- 负责人:
- 金额:$ 73.67万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-09-01 至 2021-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Climate, weather, and ocean simulations (Earth System Models; ESMs) are crucial tools for the study of the Earth system, providing both scientific insight into fundamental dynamics as well as valuable practical predictions about Earth's future. Continuous increases in ESM spatial resolution have led to more realistic, more detailed physical representations of Earth system processes, while the proliferation of statistical ensembles of simulations has greatly enhanced understanding of uncertainty and internal variability. Hand in hand with this progress has come the generation of Petabytes of simulation data, resulting in huge downstream challenges for geoscience researchers. The task of mining ESM output for scientific insights has now itself become a serious Big Data problem. Existing Big Data tools cannot easily be applied to the analysis of ESM data, leading to a building crisis across a wide range of geoscience fields. This is exactly the sort of problem EarthCube was conceived to address. The project will integrate a suite of open-source software tools (the "Pangeo Platform") which together can tackle petabyte-scale ESM datasets. Additionally, training and educational materials for these tools will be developed, distributed widely online, and integrated into existing educational curricula at Columbia. A workshop at NCAR in the final year will help inform the broader community about Pangeo. Collaborators at other US climate modeling centers will encourage adoption and participation in the Pangeo project by their scientists. Beyond climate and related fields, multidimensional numeric arrays are common in many fields of science (e.g. astronomy, materials science, microscopy). However, the dominant Big Data software stack (Hadoop) is oriented towards tabular text-based data structures and cannot easily ingest petabyte scale multidimensional numeric arrays. The proposed work thus has potential to transform Data Science itself, enabling analysis of such datasets via a novel, highly scalable, highly flexible tool with a syntax familiar to disciplinary researchers.The core technologies are the python packages Dask, a flexible parallel computing library which provides dynamic task scheduling, and XArray, a wrapper layer over Dask data structures which provides user-friendly metadata tracking, indexing, and visualization. These tools interface with netCDF datasets and understand CF conventions. They will be brought to bear on four high impact Geoscience Use Cases in atmospheric science, land-surface hydrology, and physical oceanography. Disciplinary scientists will define workflows for each use case and interact with computational scientists to demonstrate, benchmark, and optimize the software. The resulting software improvements will be contributed back to the upstream open source projects, ensuring long-term sustainability of the platform. The end result will be a robust new software toolkit for climate science and beyond. This toolkit will enhance the Data Science aspect of EarthCube. Implementation of these tools on the cloud will also be tested, taking advantage of agreement between commercial cloud service providers and NSF for the BIGDATA solicitation.
气候、天气和海洋模拟(地球系统模型;ESM)是研究地球系统的重要工具,既提供了对基本动力学的科学见解,也提供了对地球未来的有价值的实用预测。ESM空间分辨率的不断提高导致了对地球系统过程的更现实、更详细的物理表示,而统计模拟集合的激增极大地加强了对不确定性和内部变异性的理解。伴随着这一进展,产生了数PB的模拟数据,这给地学研究人员带来了巨大的下游挑战。挖掘ESM输出以获得科学见解的任务本身现在已经成为一个严重的大数据问题。现有的大数据工具不能轻松地应用于ESM数据的分析,导致了广泛的地学领域的构建危机。这正是地球立方计划要解决的问题。该项目将集成一套开源软件工具(“Pangeo平台”),这些工具可以共同处理PB级的ESM数据集。此外,将开发这些工具的培训和教育材料,在网上广泛分发,并将其整合到哥伦比亚大学现有的教育课程中。最后一年在NCAR举办的研讨会将有助于让更广泛的社区了解Pangeo。美国其他气候建模中心的合作者将鼓励他们的科学家采用和参与Pangeo项目。除了气候和相关领域,多维数字阵列在许多科学领域(如天文学、材料科学、显微镜)中很常见。然而,占主导地位的大数据软件堆栈(Hadoop)面向基于表格文本的数据结构,不能轻松地接收PB级多维数字数组。因此,拟议的工作具有改变数据科学本身的潜力,通过一个新颖的、高度可扩展的、高度灵活的工具来分析此类数据集,其语法对学科研究人员来说是熟悉的。核心技术是PYTHON包DASK,它是一个灵活的并行计算库,提供动态任务调度,以及XArray,它是DASK数据结构上的包装层,提供用户友好的元数据跟踪、索引和可视化。这些工具与NetCDF数据集交互,并了解CF约定。它们将被用于大气科学、陆地水文学和物理海洋学中的四个高影响地球科学使用案例。学科科学家将为每个用例定义工作流,并与计算科学家互动,以演示、基准和优化软件。由此产生的软件改进将回馈给上游开源项目,确保平台的长期可持续性。最终结果将是一个强大的新软件工具包,用于气候科学和其他领域。该工具包将增强EarthCube的数据科学方面。这些工具在云上的实施也将进行测试,利用商业云服务提供商和NSF之间的协议进行BigData招标。
项目成果
期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Future Global Convective Environments in CMIP6 Models
- DOI:10.1029/2021ef002277
- 发表时间:2021-12-01
- 期刊:
- 影响因子:8.2
- 作者:Lepore, Chiara;Abernathey, Ryan;Tippett, Michael K.
- 通讯作者:Tippett, Michael K.
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Ryan Abernathey其他文献
Open Code Policy for NASA Space Science: A Perspective from NASA-Supported Ocean Modeling and Ocean Data Analysis
NASA 空间科学的开放代码政策:NASA 支持的海洋建模和海洋数据分析的视角
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
S. Gille;Ryan Abernathey;T. Chereskin;B. Cornuelle;Patrick Heimbach;M. Mazloff;Cesar B. Rocha;Saulo Soares;Maike Sonnewald;Bia Villas Boas;Jinbo Wang - 通讯作者:
Jinbo Wang
Rapid changes in terrestrial carbon dioxide uptake captured in near-real time from a geostationary satellite: The ALIVE framework
地球静止卫星近实时捕捉到的陆地二氧化碳吸收的快速变化:ALIVE框架
- DOI:
10.1016/j.rse.2025.114759 - 发表时间:
2025-07-01 - 期刊:
- 影响因子:11.400
- 作者:
Danielle Losos;Sadegh Ranjbar;Sophie Hoffman;Ryan Abernathey;Ankur R. Desai;Jason Otkin;Helin Zhang;Youngryel Ryu;Paul C. Stoy - 通讯作者:
Paul C. Stoy
THE PANGEO BIG DATA ECOSYSTEM AND ITS USE AT CNES
PANGEO 大数据生态系统及其在 CNES 的使用
- DOI:
- 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Guillaume Eynard;Ryan Abernathey;Joseph Hamman;Aurelien Ponte;Willi Rath - 通讯作者:
Willi Rath
Ryan Abernathey的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Ryan Abernathey', 18)}}的其他基金
EarthCube Data Capabilities: A Cloud-Native Data Repository for the Geoscience Community
EarthCube 数据功能:地球科学社区的云原生数据存储库
- 批准号:
2026932 - 财政年份:2020
- 资助金额:
$ 73.67万 - 项目类别:
Standard Grant
Collaborative Research: Ocean Transport and Eddy Energy
合作研究:海洋运输和涡流能
- 批准号:
1912325 - 财政年份:2019
- 资助金额:
$ 73.67万 - 项目类别:
Standard Grant
Collaborative Research: Framework: Data: Toward Exascale Community Ocean Circulation Modeling
合作研究:框架:数据:迈向百万兆亿级社区海洋环流建模
- 批准号:
1835778 - 财政年份:2018
- 资助金额:
$ 73.67万 - 项目类别:
Standard Grant
CAREER: Evolution of Ocean Mesoscale Turbulence in a Changing Climate
职业:气候变化中海洋中尺度湍流的演变
- 批准号:
1553593 - 财政年份:2016
- 资助金额:
$ 73.67万 - 项目类别:
Continuing Grant
Collaborative Research: The Upper Branch of the Southern Ocean Overturning in the Southern Ocean State Estimate: Water Mass Transformation and the 3-D Residual Circulation
合作研究:南大洋上游支流翻转 南大洋状态估计:水团转化和 3-D 剩余环流
- 批准号:
1357133 - 财政年份:2014
- 资助金额:
$ 73.67万 - 项目类别:
Standard Grant
相似海外基金
EarthCube Data Capabilities: Collaborative Proposal: Reducing Time-To-Science in the Earth Sciences: Annotations to foster convergence, inclusion, and credit
EarthCube 数据功能:协作提案:缩短地球科学的科学时间:促进融合、包容和信用的注释
- 批准号:
2246427 - 财政年份:2022
- 资助金额:
$ 73.67万 - 项目类别:
Standard Grant
EarthCube Data Capabilities: Collaborative Proposal: Reducing Time-To-Science in the Earth Sciences: Annotations to foster convergence, inclusion, and credit
EarthCube 数据功能:协作提案:缩短地球科学的科学时间:促进融合、包容和信用的注释
- 批准号:
1928341 - 财政年份:2019
- 资助金额:
$ 73.67万 - 项目类别:
Standard Grant
EarthCube Data Capabilities: Collaborative Proposal: Assimilative Mapping of Geospace Observations
EarthCube 数据能力:协作提案:地理空间观测同化制图
- 批准号:
1928327 - 财政年份:2019
- 资助金额:
$ 73.67万 - 项目类别:
Standard Grant
EarthCube Science-Enabling Data Capabilities: Collaborative Proposal: Extending Ocean Drilling Pursuits [eODP]: Microfossils and Stratigraphy
EarthCube 科学支持数据能力:协作提案:扩展海洋钻探研究 [eODP]:微化石和地层学
- 批准号:
1928362 - 财政年份:2019
- 资助金额:
$ 73.67万 - 项目类别:
Standard Grant
EarthCube Data Capabilities: Collaborative Proposal: Reducing Time-To-Science in the Earth Sciences: Annotations to foster convergence, inclusion, and credit
EarthCube 数据功能:协作提案:缩短地球科学的科学时间:促进融合、包容和信用的注释
- 批准号:
1928320 - 财政年份:2019
- 资助金额:
$ 73.67万 - 项目类别:
Standard Grant
EarthCube Data Capabilities: Collaborative Proposal: Reducing Time-To-Science in the Earth Sciences: Annotations to foster convergence, inclusion, and credit
EarthCube 数据功能:协作提案:缩短地球科学的科学时间:促进融合、包容和信用的注释
- 批准号:
1928333 - 财政年份:2019
- 资助金额:
$ 73.67万 - 项目类别:
Standard Grant
EarthCube Data Capabilities: Collaborative Proposal: Broadening Community Use and Adoption of StraboSpot
EarthCube 数据功能:协作提案:扩大 StraboSpot 的社区使用和采用
- 批准号:
1928348 - 财政年份:2019
- 资助金额:
$ 73.67万 - 项目类别:
Standard Grant
EarthCube Data Capabilities: Collaborative Proposal: Broadening Community Use and Adoption of StraboSpot
EarthCube 数据功能:协作提案:扩大 StraboSpot 的社区使用和采用
- 批准号:
1928389 - 财政年份:2019
- 资助金额:
$ 73.67万 - 项目类别:
Standard Grant
EarthCube Science-Enabling Data Capabilities: Collaborative Proposal: Extending Ocean Drilling Pursuits [eODP]: Microfossils and Stratigraphy
EarthCube 科学支持数据能力:协作提案:扩展海洋钻探研究 [eODP]:微化石和地层学
- 批准号:
1927866 - 财政年份:2019
- 资助金额:
$ 73.67万 - 项目类别:
Standard Grant
EarthCube Data Capabilities: Collaborative Proposal: Reducing Time-To-Science in the Earth Sciences: Annotations to foster convergence, inclusion, and credit
EarthCube 数据功能:协作提案:缩短地球科学的科学时间:促进融合、包容和信用的注释
- 批准号:
1928318 - 财政年份:2019
- 资助金额:
$ 73.67万 - 项目类别:
Standard Grant