Elements: PASSPP: Provenance-Aware Scalable Seismic Data Processing with Portability

要素: PASSPP:具有可移植性的来源感知可扩展地震数据处理

基本信息

  • 批准号:
    1931352
  • 负责人:
  • 金额:
    $ 23.28万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2019
  • 资助国家:
    美国
  • 起止时间:
    2019-11-01 至 2022-10-31
  • 项目状态:
    已结题

项目摘要

Most of what we know about the Earth's deep interior comes from the analysis of ground motion data recording seismic waves produced by large earthquakes from instruments around the entire planet. Seismologists have developed a long list of methods to process modern seismic data to ?image? the Earth?s interior. Much of our understanding of Earth's interior has been limited by the resolution of the tools available to construct these "images". At present, the massive increase in data volume has pushed the data processing infrastructure of seismology to the breaking point. The inability to handle data of this scale has imposed significant barrier to scientific discoveries, especially for the smaller research groups with limited resources. Aiming to help improve this situation, this project introduces a new data management and processing system that is portable and scalable to run on any platforms from a personal computer to a large-scale supercomputer. By leveraging and integrating sophisticated tools from cloud computing and high-performance computing (HPC) communities, the system can fill in the widening gap between the massive data made available by data centers and the inadequacy of data management and processing capability provided with current tools. Seamless discovery, access, transfer, and processing of data and metadata outside of data centers will become possible for the community. This project will also serve as the foundation to enable novel research utilizing massive data to change the way we study the structure, composition, and evolution of the Earth. This project aims to develop a seismic data management and processing system that is composed of a scalable parallel processing framework based on dataflow computation model, a NoSQL database system centered on document store, and a container-based virtualization environment. The scalable processing component will be based on the iterative map-reduce model using Apache Spark to handle scheduling and flow of data through systems of different scales. The provenance-aware data management will be enabled by managing all data created during processing with MongoDB, including process generated metadata, processed waveform data, processing parameters, and the log outputs. All these core components as well as a script to configure and deploy the framework on different systems will be containerized with Singularity to provide portability. All these components serve the two primary goals of the project: produce a system that will allow common seismology algorithms to run effectively on modern HPC platforms; and provide the means for seismologists with average experience in programming to implement their own algorithms to extend the system. The system will serve as the infrastructure to make data intensive research such as deep learning possible for smaller research groups that usually don't have the necessary manpower to manage and process massive data in a sustainable fashion. By enabling the ability to process massive data collected by increasing number of instruments, it will facilitate the transition of the field into data-intensive paradigm of science discovery.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
我们对地球内部深处的了解大多来自对地面运动数据的分析,这些数据记录了整个地球上的仪器所产生的大地震地震波。 地震学家已经开发了一长串的方法来处理现代地震数据,以?图像?地球?的内部。 我们对地球内部的大部分了解都受到构建这些“图像”的工具分辨率的限制。 目前,数据量的大量增加已经将地震学的数据处理基础设施推向了崩溃的边缘。 无法处理这种规模的数据对科学发现造成了重大障碍,特别是对于资源有限的小型研究小组。 为了帮助改善这种情况,该项目引入了一种新的数据管理和处理系统,该系统具有便携性和可扩展性,可以在从个人计算机到大型超级计算机的任何平台上运行。 通过利用和集成来自云计算和高性能计算(HPC)社区的复杂工具,该系统可以填补数据中心提供的海量数据与当前工具提供的数据管理和处理能力不足之间日益扩大的差距。 在数据中心之外无缝地发现、访问、传输和处理数据和元数据将成为可能。 该项目还将成为利用大量数据进行新研究的基础,以改变我们研究地球结构,组成和演化的方式。该项目旨在开发一个地震数据管理和处理系统,该系统由基于XML计算模型的可扩展并行处理框架、以文档存储为中心的NoSQL数据库系统和基于容器的虚拟化环境组成。 可扩展的处理组件将基于使用Apache Spark的迭代map-reduce模型,以处理不同规模系统的调度和数据流。 将通过管理MongoDB处理期间创建的所有数据来启用来源感知数据管理,包括过程生成的元数据,处理的波形数据,处理参数和日志输出。 所有这些核心组件以及在不同系统上配置和部署框架的脚本都将通过Singularity进行容器化,以提供可移植性。 所有这些组件都服务于该项目的两个主要目标:产生一个系统,使常见的地震学算法能够在现代HPC平台上有效运行;并为具有编程经验的地震学家提供实现自己的算法以扩展系统的方法。 该系统将作为基础设施,使深度学习等数据密集型研究成为可能,因为小型研究小组通常没有必要的人力来以可持续的方式管理和处理大量数据。 该奖项反映了NSF的法定使命,通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
MsPASS: A Data Management and Processing Framework for Seismology
MsPASS:地震学数据管理和处理框架
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    3.3
  • 作者:
    Wang, Yinzhi;Pavlis, Gary L.;Yang, Weiming;Ma, Jinxin
  • 通讯作者:
    Ma, Jinxin
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Yinzhi Wang其他文献

Performance Comparison of Julia Distributed Implementations of Dirichlet Process Mixture Models
Dirichlet 过程混合模型的 Julia 分布式实现的性能比较
(U-Th)/He thermochronology of metallic ore deposits in the Liaodong Peninsula: Implications for orefield evolution in northeast China
辽东半岛金属矿床(U-Th)/He热年代学:对中国东北地区矿田演化的启示
  • DOI:
    10.1016/j.oregeorev.2017.11.025
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    3.3
  • 作者:
    Yinzhi Wang;Fei Wang;Lin Wu;Wenbei Shi;Liekun Yang
  • 通讯作者:
    Liekun Yang
Automatic BLAS Offloading on Unified Memory Architecture: A Study on NVIDIA Grace-Hopper
统一内存架构上的自动 BLAS 卸载:NVIDIA Grace-Hopper 的研究
  • DOI:
    10.1145/3626203.3670561
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Junjie Li;Yinzhi Wang;Xiao Liang;Hang Liu
  • 通讯作者:
    Hang Liu
Perspectives and Experiences Supporting Containers for Research Computing at the Texas Advanced Computing Center
德克萨斯高级计算中心支持研究计算容器的观点和经验
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Erik Ferlanti;William J. Allen;Ernesto A. B. F. Lima;Yinzhi Wang;John Fonner
  • 通讯作者:
    John Fonner
Optimizing GPU-Enhanced HPC System and Cloud Procurements for Scientific Workloads
优化 GPU 增强型 HPC 系统和科学工作负载的云采购
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    R. T. Evans;M. Cawood;Stephen Lien Harrell;Lei Huang;Si Liu;Chun;Amit Ruhela;Yinzhi Wang;Zhao Zhang
  • 通讯作者:
    Zhao Zhang

Yinzhi Wang的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Yinzhi Wang', 18)}}的其他基金

OAC Core: Cost-Adaptive Monitoring and Real-Time Tuning at Function-Level
OAC核心:功能级成本自适应监控和实时调优
  • 批准号:
    2402542
  • 财政年份:
    2024
  • 资助金额:
    $ 23.28万
  • 项目类别:
    Standard Grant
Collaborative Research: Frameworks: Seismic COmputational Platform for Empowering Discovery (SCOPED)
合作研究:框架:增强发现能力的地震计算平台(SCOPED)
  • 批准号:
    2103494
  • 财政年份:
    2021
  • 资助金额:
    $ 23.28万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了