A Scalable, Massively-Parallel Runtime System with Predictable Performance

具有可预测性能的可扩展、大规模并行运行时系统

基本信息

项目摘要

The goal of project B within the Stratosphere II Research Unit is to research and develop the runtime system “Aura” to execute multiple, concurrent data analysis programs in a massively parallel fashion on a distributed cloud or cluster infrastructure. Project B is divided into two primary research areas RA-B.1 and RA-B.2 that enhance the runtime environment with capabilities to handle evolving datasets and to represent and manage the resulting distributed state with the goal of supporting novel iterative data analysis algorithms in a massively parallel, fault-tolerant way. We plan to establish a novel execution model, the so-called Supremo Execution Plan (SEP), which models the framework’s workload in the form in which it is executed. A SEP is a restricted cyclic graph that combines evolving datasets in the form of physical views, with semantically rich operators. Physical views can represent traditional data sources and sinks but will also be able to hold the state in iterative data analysis, as well as the state occurring in stateful operators on infinite data, e.g. windowed operators. In contrast to the UDF black-boxes of Stratosphere I’s Nephele, the added knowledge about operator characteristics in combination with workload-aware (re-)scheduling policies allows the runtime core’s scheduler to provide predictable runtime behavior of individual deployed data analysis programs. In particular, this project aims at answering following questions:1. How must a runtime system be architected to optimize for the execution of iterative data analysis programs on various hardware architectures, exploiting the advantages of a virtualized hardware?2. How can we efficiently maintain state and provide fault-tolerant execution of programs with iterations on large-compute clusters?3. How can we adapt to the characteristics of virtualization methods to achieve predictable performance in terms of low-latency bounds and resource guarantees? How do we provide up- and down-scaling based on computational needs or on ingestion rates, assuming on-demand elasticity of Cloud systems?4. How can large, complex models be shared and distributed between workloads of concurrent queries?
Stratosphere II研究部内项目B的目标是研究和开发运行时系统“Aura”,以便在分布式云或集群基础设施上以大规模并行方式执行多个并发数据分析程序。项目B分为两个主要研究领域RA-B.1和RA-B.2,它们增强了运行时环境,使其具有处理不断发展的数据集以及表示和管理所产生的分布式状态的能力,目标是以大规模并行、容错的方式支持新颖的迭代数据分析算法。我们计划建立一个新的执行模型,即所谓的Supremo执行计划(SEP),它以执行的形式对框架的工作负载进行建模。SEP是一个受限的循环图,它以物理视图的形式将不断发展的数据集与语义丰富的运算符相结合。物理视图可以表示传统的数据源和数据汇,但也能够在迭代数据分析中保持状态,以及在无限数据上的有状态运算符中出现的状态,例如窗口运算符。与Stratosphere I的Nephele的UDF黑盒相比,关于操作员特征的附加知识与工作负载感知(重新)调度策略相结合,允许运行时核心的调度程序提供单个部署的数据分析程序的可预测运行时行为。具体而言,本项目旨在回答以下问题:1。如何构建运行时系统,以优化各种硬件架构上的迭代数据分析程序的执行,利用虚拟化硬件的优势?2.我们如何在大型计算集群上有效地维护状态并提供具有迭代的程序的容错执行?3.我们如何适应虚拟化方法的特点,在低延迟边界和资源保证方面实现可预测的性能?假设云系统具有按需弹性,我们如何根据计算需求或摄取率提供向上和向下扩展?4.如何在并发查询的工作负载之间共享和分布大型复杂模型?

项目成果

期刊论文数量(8)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Memorization of Materialization Points
实体化点的记忆
Network-aware resource management for scalable data analytics frameworks
Evaluation of Network Topology Inference in Opaque Compute Clouds through End-to-End Measurements
  • DOI:
    10.1109/cloud.2011.30
  • 发表时间:
    2011-07
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Dominic Battré;Natalia Frejnik;Siddhant Goel;O. Kao;Daniel Warneke
  • 通讯作者:
    Dominic Battré;Natalia Frejnik;Siddhant Goel;O. Kao;Daniel Warneke
Inferring Network Topologies in Infrastructure as a Service Cloud
推断基础设施即服务云中的网络拓扑
Ellis: Dynamically Scaling Distributed Dataflows to Meet Runtime Targets
Ellis:动态扩展分布式数据流以满足运行时目标
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Professor Dr. Odej Kao其他文献

Professor Dr. Odej Kao的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Professor Dr. Odej Kao', 18)}}的其他基金

Massively Parallel, Adaptive and Fault-Tolerant Execution of Data Flow Programs on Dynamic Clouds
动态云上数据流程序的大规模并行、自适应和容错执行
  • 批准号:
    174446757
  • 财政年份:
    2010
  • 资助金额:
    --
  • 项目类别:
    Research Units
C5: Collaborative and Cross-Context Cluster Configuration for Distributed Data-Parallel Processing
C5:分布式数据并行处理的协作和跨上下文集群配置
  • 批准号:
    506529034
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
    Research Grants

相似海外基金

MFB: Massively parallel identification of translation regulatory sequences in human and viral mRNAs
MFB:大规模并行鉴定人类和病毒 mRNA 中的翻译调控序列
  • 批准号:
    2330451
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Massively-parallel functional interrogation of genetic variation in CMD-associated alpha-dystroglycan glycosylating enzymes
CMD 相关 α-肌营养不良聚糖糖基化酶遗传变异的大规模并行功能询问
  • 批准号:
    10802855
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Massively Parallel Optoacoustic Retinal Stimulation at Micrometer-Resolution
微米分辨率的大规模并行光声视网膜刺激
  • 批准号:
    10731795
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
SBIR Phase I: Massively Parallel Protocols for Software-based Wireless Systems
SBIR 第一阶段:基于软件的无线系统的大规模并行协议
  • 批准号:
    2322307
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Collaborative Research: Ideas Lab: Discovery of Novel Functional RNA Classes by Computational Integration of Massively-Parallel RBP Binding and Structure Data
合作研究:创意实验室:通过大规模并行 RBP 结合和结构数据的计算集成发现新的功能性 RNA 类别
  • 批准号:
    2243706
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
RII Track-4: NSF: Massively Parallel Graph Processing on Next-Generation Multi-GPU Supercomputers
RII Track-4:NSF:下一代多 GPU 超级计算机上的大规模并行图形处理
  • 批准号:
    2229394
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Collaborative Research: Ideas Lab: Discovery of Novel Functional RNA Classes by Computational Integration of Massively-Parallel RBP Binding and Structure Data
合作研究:创意实验室:通过大规模并行 RBP 结合和结构数据的计算集成发现新的功能性 RNA 类别
  • 批准号:
    2243704
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Performance Improvement of Massively Parallel Sample-Based Model Predictive Control
大规模并行基于样本的模型预测控制的性能改进
  • 批准号:
    23K03896
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Collaborative Research: Ideas Lab: Discovery of Novel Functional RNA Classes by Computational Integration of Massively-Parallel RBP Binding and Structure Data
合作研究:创意实验室:通过大规模并行 RBP 结合和结构数据的计算集成发现新的功能性 RNA 类别
  • 批准号:
    2243703
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Collaborative Research: Ideas Lab: Discovery of Novel Functional RNA Classes by Computational Integration of Massively-Parallel RBP Binding and Structure Data
合作研究:创意实验室:通过大规模并行 RBP 结合和结构数据的计算集成发现新的功能性 RNA 类别
  • 批准号:
    2243705
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了