Collaborative Proposal: EarthCube Integration: Pangeo: An Open Source Big Data Climate Science Platform
合作提案:EarthCube 集成:Pangeo:开源大数据气候科学平台
基本信息
- 批准号:1740633
- 负责人:
- 金额:$ 46.69万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-09-01 至 2022-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Climate, weather, and ocean simulations (Earth System Models; ESMs) are crucial tools for the study of the Earth system, providing both scientific insight into fundamental dynamics as well as valuable practical predictions about Earth's future. Continuous increases in ESM spatial resolution have led to more realistic, more detailed physical representations of Earth system processes, while the proliferation of statistical ensembles of simulations has greatly enhanced understanding of uncertainty and internal variability. Hand in hand with this progress has come the generation of Petabytes of simulation data, resulting in huge downstream challenges for geoscience researchers. The task of mining ESM output for scientific insights has now itself become a serious Big Data problem. Existing Big Data tools cannot easily be applied to the analysis of ESM data, leading to a building crisis across a wide range of geoscience fields. This is exactly the sort of problem EarthCube was conceived to address. The project will integrate a suite of open-source software tools (the "Pangeo Platform") which together can tackle petabyte-scale ESM datasets. Additionally, training and educational materials for these tools will be developed, distributed widely online, and integrated into existing educational curricula at Columbia. A workshop at NCAR in the final year will help inform the broader community about Pangeo. Collaborators at other US climate modeling centers will encourage adoption and participation in the Pangeo project by their scientists. Beyond climate and related fields, multidimensional numeric arrays are common in many fields of science (e.g. astronomy, materials science, microscopy). However, the dominant Big Data software stack (Hadoop) is oriented towards tabular text-based data structures and cannot easily ingest petabyte scale multidimensional numeric arrays. The proposed work thus has potential to transform Data Science itself, enabling analysis of such datasets via a novel, highly scalable, highly flexible tool with a syntax familiar to disciplinary researchers.The core technologies are the python packages Dask, a flexible parallel computing library which provides dynamic task scheduling, and XArray, a wrapper layer over Dask data structures which provides user-friendly metadata tracking, indexing, and visualization. These tools interface with netCDF datasets and understand CF conventions. They will be brought to bear on four high impact Geoscience Use Cases in atmospheric science, land-surface hydrology, and physical oceanography. Disciplinary scientists will define workflows for each use case and interact with computational scientists to demonstrate, benchmark, and optimize the software. The resulting software improvements will be contributed back to the upstream open source projects, ensuring long-term sustainability of the platform. The end result will be a robust new software toolkit for climate science and beyond. This toolkit will enhance the Data Science aspect of EarthCube. Implementation of these tools on the cloud will also be tested, taking advantage of agreement between commercial cloud service providers and NSF for the BIGDATA solicitation.
气候、天气和海洋模拟(地球系统模型;ESMs)是研究地球系统的重要工具,既提供了对基本动力学的科学见解,也提供了对地球未来有价值的实际预测。ESM空间分辨率的持续提高导致了对地球系统过程的更真实、更详细的物理表征,而模拟的统计集合的激增极大地增强了对不确定性和内部变率的理解。与此同时,产生了pb级的模拟数据,给地球科学研究人员带来了巨大的下游挑战。挖掘ESM输出以获得科学见解的任务现在本身已经成为一个严重的大数据问题。现有的大数据工具不能很容易地应用于ESM数据的分析,这导致了广泛的地球科学领域的建筑危机。这正是“地球立方体”所要解决的问题。该项目将集成一套开源软件工具(“Pangeo平台”),这些工具可以一起处理pb级的ESM数据集。此外,将开发这些工具的培训和教育材料,在网上广泛分发,并整合到哥伦比亚大学现有的教育课程中。最后一年在NCAR举办的一个研讨会将帮助更广泛的社区了解Pangeo。其他美国气候模拟中心的合作者将鼓励他们的科学家采用和参与Pangeo项目。除了气候和相关领域,多维数值数组在许多科学领域(如天文学、材料科学、显微镜)也很常见。然而,占主导地位的大数据软件栈(Hadoop)是面向基于表格文本的数据结构的,不能轻易地摄取pb规模的多维数字数组。因此,拟议的工作有可能改变数据科学本身,通过一种新颖的、高度可扩展的、高度灵活的工具来分析这些数据集,该工具具有学科研究人员熟悉的语法。核心技术是python包Dask(一个灵活的并行计算库,提供动态任务调度)和XArray (Dask数据结构的包装层,提供用户友好的元数据跟踪、索引和可视化)。这些工具与netCDF数据集接口,并理解CF约定。他们将承担大气科学、陆地表面水文学和物理海洋学中四个高影响力的地球科学用例。学科科学家将为每个用例定义工作流程,并与计算科学家进行交互,以演示、基准测试和优化软件。由此产生的软件改进将回馈给上游开源项目,确保平台的长期可持续性。最终的结果将是为气候科学和其他领域提供一个强大的新软件工具包。这个工具包将增强EarthCube的数据科学方面。利用商业云服务提供商和NSF就BIGDATA招标达成的协议,这些工具在云上的实施也将进行测试。
项目成果
期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
MetPy: A Meteorological Python Library for Data Analysis and Visualization
MetPy:用于数据分析和可视化的气象 Python 库
- DOI:10.1175/bams-d-21-0125.1
- 发表时间:2022
- 期刊:
- 影响因子:8
- 作者:May, Ryan M.;Goebbert, Kevin H.;Thielen, Jonathan E.;Leeman, John R.;Camron, M. Drew;Bruick, Zachary;Bruning, Eric C.;Manser, Russell P.;Arms, Sean C.;Marsh, Patrick T.
- 通讯作者:Marsh, Patrick T.
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Ryan May其他文献
matplotlib/matplotlib v3.1.3
matplotlib/matplotlib v3.1.3
- DOI:
- 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Thomas A Caswell;Michael Droettboom;Antony Lee;John Hunter;Eric Firing;David Stansby;J. Klymak;Tim Hoffmann;Elliott Sales de Andrade;Nelle Varoquaux;Jens Hedegaard Nielsen;Benjamin Root;Phil Elson;Ryan May;Darren Dale;Jae;Jouni K. Seppänen;Damon McDougall;Andrew D. Straw;Paul Hobson;Christoph Gohlke;Tony S Yu;Eric Ma;Adrien F. Vincent;Steven Silvester;Charlie Moad;Nikita Kniazev;P. Ivanov;Elan Ernest;Jan Katins - 通讯作者:
Jan Katins
LOCAL CLIMATOLOGICAL DATA
当地气候数据
- DOI:
10.1111/j.1600-0447.1947.tb03912.x - 发表时间:
1946 - 期刊:
- 影响因子:6.7
- 作者:
Thomas A Caswell;Michael Droettboom;Antony Lee;Elliott Sales de Andrade;John Hunter;Tim Hoffmann;Eric Firing;Jody Klymak;David Stansby;Nelle Varoquaux;Jens Hedegaard Nielsen;Benjamin Root;Ryan May;Phil Elson;Jouni K. Seppänen;Darren Dale;Jae;Damon McDougall;Andrew D. Straw;Paul Hobson;Christoph Gohlke;Tony S Yu;Eric Ma;Adrien F. Vincent;Hannah;Steven Silvester;Charlie Moad;Nikita Kniazev;Elan Ernest;P. Ivanov - 通讯作者:
P. Ivanov
matplotlib/matplotlib: REL: v3.2.2
matplotlib/matplotlib:相对:v3.2.2
- DOI:
10.5281/zenodo.3898017 - 发表时间:
2020 - 期刊:
- 影响因子:4.1
- 作者:
Thomas A Caswell;Michael Droettboom;Antony Lee;John Hunter;Eric Firing;Elliott Sales de Andrade;Tim Hoffmann;David Stansby;Jody Klymak;Nelle Varoquaux;Jens Hedegaard Nielsen;Benjamin Root;Phil Elson;Ryan May;Darren Dale;Jae;Jouni K. Seppänen;Damon McDougall;Andrew D. Straw;Paul Hobson;Christoph Gohlke;Tony S Yu;Eric Ma;Adrien F. Vincent;Steven Silvester;Charlie Moad;Nikita Kniazev;Hannah;Elan Ernest - 通讯作者:
Elan Ernest
matplotlib/matplotlib: REL: v3.3.4
matplotlib/matplotlib:相对:v3.3.4
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Thomas A Caswell;Michael Droettboom;Antony Lee;Elliott Sales de Andrade;John Hunter;Eric Firing;Tim Hoffmann;Jody Klymak;David Stansby;Nelle Varoquaux;Jens Hedegaard Nielsen;Benjamin Root;Ryan May;Phil Elson;Jouni K. Seppänen;Darren Dale;Jae;Damon McDougall;Andrew D. Straw;Paul Hobson;Christoph Gohlke;Tony S Yu;Eric Ma;Adrien F. Vincent;Hannah;Steven Silvester;Charlie Moad;Nikita Kniazev;Elan Ernest;P. Ivanov - 通讯作者:
P. Ivanov
matplotlib: matplotlib v1.5.1
matplotlib:matplotlib v1.5.1
- DOI:
- 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
Michael Droettboom;Thomas A Caswell;Eric Firing;Damon McDougall;P. Ivanov;M. Giuca;J. K. Seppänen;J. Evans;Cimarron;Steven Silvester;Jens Nielsen;Charles W. Moad;mdehoon;Paul Hobson;Jae;A. Straw;John D. Hunter;Ian Thomas;Federico Ariza;Thomas Hisch;Jeff Whitaker;Phil Elson;Benjamin Root;Eric J. Ma;Tony S Yu;D. Dale;Nelle Varoquaux;Christoph Gohlke;Peter Würtz;Ryan May - 通讯作者:
Ryan May
Ryan May的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Ryan May', 18)}}的其他基金
Elements: Scaling MetPy to Big Data Workflows in Meteorology and Climate Science
要素:将 MetPy 扩展到气象学和气候科学中的大数据工作流程
- 批准号:
2103682 - 财政年份:2021
- 资助金额:
$ 46.69万 - 项目类别:
Standard Grant
SI2-SSE: MetPy - A Python GEMPAK Replacement for Meteorological Data Analysis
SI2-SSE:MetPy - 用于气象数据分析的 Python GEMPAK 替代品
- 批准号:
1740315 - 财政年份:2017
- 资助金额:
$ 46.69万 - 项目类别:
Standard Grant
相似海外基金
EarthCube Data Capabilities: Collaborative Proposal: Reducing Time-To-Science in the Earth Sciences: Annotations to foster convergence, inclusion, and credit
EarthCube 数据功能:协作提案:缩短地球科学的科学时间:促进融合、包容和信用的注释
- 批准号:
2246427 - 财政年份:2022
- 资助金额:
$ 46.69万 - 项目类别:
Standard Grant
EarthCube Data Capabilities: Collaborative Proposal: Reducing Time-To-Science in the Earth Sciences: Annotations to foster convergence, inclusion, and credit
EarthCube 数据功能:协作提案:缩短地球科学的科学时间:促进融合、包容和信用的注释
- 批准号:
1928341 - 财政年份:2019
- 资助金额:
$ 46.69万 - 项目类别:
Standard Grant
EarthCube Data Capabilities: Collaborative Proposal: Assimilative Mapping of Geospace Observations
EarthCube 数据能力:协作提案:地理空间观测同化制图
- 批准号:
1928327 - 财政年份:2019
- 资助金额:
$ 46.69万 - 项目类别:
Standard Grant
EarthCube Science-Enabling Data Capabilities: Collaborative Proposal: Extending Ocean Drilling Pursuits [eODP]: Microfossils and Stratigraphy
EarthCube 科学支持数据能力:协作提案:扩展海洋钻探研究 [eODP]:微化石和地层学
- 批准号:
1928362 - 财政年份:2019
- 资助金额:
$ 46.69万 - 项目类别:
Standard Grant
EarthCube Data Capabilities: Collaborative Proposal: Reducing Time-To-Science in the Earth Sciences: Annotations to foster convergence, inclusion, and credit
EarthCube 数据功能:协作提案:缩短地球科学的科学时间:促进融合、包容和信用的注释
- 批准号:
1928320 - 财政年份:2019
- 资助金额:
$ 46.69万 - 项目类别:
Standard Grant
EarthCube Data Capabilities: Collaborative Proposal: Reducing Time-To-Science in the Earth Sciences: Annotations to foster convergence, inclusion, and credit
EarthCube 数据功能:协作提案:缩短地球科学的科学时间:促进融合、包容和信用的注释
- 批准号:
1928333 - 财政年份:2019
- 资助金额:
$ 46.69万 - 项目类别:
Standard Grant
EarthCube Data Capabilities: Collaborative Proposal: Broadening Community Use and Adoption of StraboSpot
EarthCube 数据功能:协作提案:扩大 StraboSpot 的社区使用和采用
- 批准号:
1928348 - 财政年份:2019
- 资助金额:
$ 46.69万 - 项目类别:
Standard Grant
EarthCube Data Capabilities: Collaborative Proposal: Broadening Community Use and Adoption of StraboSpot
EarthCube 数据功能:协作提案:扩大 StraboSpot 的社区使用和采用
- 批准号:
1928389 - 财政年份:2019
- 资助金额:
$ 46.69万 - 项目类别:
Standard Grant
EarthCube Science-Enabling Data Capabilities: Collaborative Proposal: Extending Ocean Drilling Pursuits [eODP]: Microfossils and Stratigraphy
EarthCube 科学支持数据能力:协作提案:扩展海洋钻探研究 [eODP]:微化石和地层学
- 批准号:
1927866 - 财政年份:2019
- 资助金额:
$ 46.69万 - 项目类别:
Standard Grant
EarthCube Data Capabilities: Collaborative Proposal: Reducing Time-To-Science in the Earth Sciences: Annotations to foster convergence, inclusion, and credit
EarthCube 数据功能:协作提案:缩短地球科学的科学时间:促进融合、包容和信用的注释
- 批准号:
1928318 - 财政年份:2019
- 资助金额:
$ 46.69万 - 项目类别:
Standard Grant