C5: Collaborative and Cross-Context Cluster Configuration for Distributed Data-Parallel Processing

C5:分布式数据并行处理的协作和跨上下文集群配置

基本信息

项目摘要

Many organizations routinely analyze large datasets today. For this, they make use of distributed data-parallel processing systems and take advantage of clusters of commodity resources. Especially smaller organizations and individual users are enabled by data processing frameworks and cloud computing, allowing them to work with large datasets at a high-level of abstraction. Still, users are required to configure adequate resources for their data processing jobs. This is often not straightforward and users frequently overprovision resources for their jobs, leading to low resource utilization as well as high costs and energy consumptions. Numerous works addressed this problem in the last decade for big data frameworks, scientific workflows, and machine learning systems, using statistical tools and performance models. However, much of the effort focused on industry settings, either assuming data on previous executions of jobs to be available or relying on potentially costly dedicated profiling. Little research has addressed use cases where runtime data is not as easily available. Addressing this research gap, we aim to develop new methods for the collaborative usage of runtime data in the proposed project, C5. We believe sharing of runtime information across different execution contexts presents a significant opportunity for performance modeling and model-based resource management in many situations, especially when the availability of runtime data is limited, and will improve the efficiency of distributed data-parallel processing. The methods we plan to develop and evaluate in this project include: - Similarity measures for computational resources and processing jobs to support the use of runtime data and performance models across execution contexts - Model selection and combination methods for robust performance estimations, even if limited training data is available or model components were trained in other contexts - Adjustment strategies that allow to efficiently update training data, performance models, and resource configurations at runtime. In addition to new methods for cross-context cluster configuration optimization based on shared performance data and models, we plan to conduct a thorough analysis of real workloads, design reproducible experiments based on infrastructure-as-code definitions and benchmarks, and provide a working implementation of the envisioned data sharing platform to the general public and ongoing collaborative research projects.
今天,许多组织经常分析大型数据集。为此,他们利用分布式数据并行处理系统,并利用商品资源集群。特别是小型组织和个人用户可以通过数据处理框架和云计算来实现,使他们能够在高抽象级别上处理大型数据集。尽管如此,用户仍然需要为他们的数据处理作业配置足够的资源。这通常并不简单,用户经常为他们的工作过度提供资源,导致资源利用率低以及成本和能耗高。 在过去的十年中,许多工作都使用统计工具和性能模型来解决大数据框架,科学工作流程和机器学习系统的这个问题。然而,大部分工作都集中在行业设置上,要么假设以前执行作业的数据可用,要么依赖于可能代价高昂的专用分析。很少有研究涉及运行时数据不容易获得的用例。 为了解决这一研究空白,我们的目标是开发新的方法,在拟议的项目,C5的运行时数据的协作使用。我们相信,在不同的执行上下文的运行时信息的共享提供了一个重要的机会,性能建模和基于模型的资源管理在许多情况下,特别是当运行时数据的可用性是有限的,并将提高分布式数据并行处理的效率。我们计划在本项目中开发和评估的方法包括:- 计算资源和处理作业的相似性度量,以支持跨执行上下文使用运行时数据和性能模型-用于鲁棒性能估计的模型选择和组合方法,即使有限的训练数据可用或者模型组件在其他上下文中被训练,调整策略,允许在运行时有效地更新训练数据,性能模型和资源配置。 除了基于共享性能数据和模型的跨上下文集群配置优化的新方法外,我们还计划对真实的工作负载进行彻底分析,基于基础设施即代码定义和基准设计可重复的实验,并向公众和正在进行的合作研究项目提供设想的数据共享平台的工作实现。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Professor Dr. Odej Kao其他文献

Professor Dr. Odej Kao的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Professor Dr. Odej Kao', 18)}}的其他基金

A Scalable, Massively-Parallel Runtime System with Predictable Performance
具有可预测性能的可扩展、大规模并行运行时系统
  • 批准号:
    248358398
  • 财政年份:
    2013
  • 资助金额:
    --
  • 项目类别:
    Research Units
Massively Parallel, Adaptive and Fault-Tolerant Execution of Data Flow Programs on Dynamic Clouds
动态云上数据流程序的大规模并行、自适应和容错执行
  • 批准号:
    174446757
  • 财政年份:
    2010
  • 资助金额:
    --
  • 项目类别:
    Research Units

相似海外基金

Collaborative Research: Laboratory Measurements of Oxygen (O) and Nitrogen (N2) Ultraviolet (UV) Cross Sections by Particle Impact for Remote Sensing of Thermosphere O/N2 Variation
合作研究:通过粒子撞击实验室测量氧气 (O) 和氮气 (N2) 紫外线 (UV) 截面,以遥感热层 O/N2 变化
  • 批准号:
    2334619
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Collaborative Research: Frameworks: MobilityNet: A Trustworthy CI Emulation Tool for Cross-Domain Mobility Data Generation and Sharing towards Multidisciplinary Innovations
协作研究:框架:MobilityNet:用于跨域移动数据生成和共享以实现多学科创新的值得信赖的 CI 仿真工具
  • 批准号:
    2411152
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Collaborative Research: Frameworks: MobilityNet: A Trustworthy CI Emulation Tool for Cross-Domain Mobility Data Generation and Sharing towards Multidisciplinary Innovations
协作研究:框架:MobilityNet:用于跨域移动数据生成和共享以实现多学科创新的值得信赖的 CI 仿真工具
  • 批准号:
    2411153
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Collaborative Research: Laboratory Measurements of Oxygen (O) and Nitrogen (N2) Ultraviolet (UV) Cross Sections by Particle Impact for Remote Sensing of Thermosphere O/N2 Variation
合作研究:通过粒子撞击实验室测量氧气 (O) 和氮气 (N2) 紫外线 (UV) 截面,以遥感热层 O/N2 变化
  • 批准号:
    2334618
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant
Collaborative Research: Frameworks: MobilityNet: A Trustworthy CI Emulation Tool for Cross-Domain Mobility Data Generation and Sharing towards Multidisciplinary Innovations
协作研究:框架:MobilityNet:用于跨域移动数据生成和共享以实现多学科创新的值得信赖的 CI 仿真工具
  • 批准号:
    2411151
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Collaborative Research: DESC: Type I: FLEX: Building Future-proof Learning-Enabled Cyber-Physical Systems with Cross-Layer Extensible and Adaptive Design
合作研究:DESC:类型 I:FLEX:通过跨层可扩展和自适应设计构建面向未来的、支持学习的网络物理系统
  • 批准号:
    2324936
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Collaborative Research: DESC: Type I: FLEX: Building Future-proof Learning-Enabled Cyber-Physical Systems with Cross-Layer Extensible and Adaptive Design
合作研究:DESC:类型 I:FLEX:通过跨层可扩展和自适应设计构建面向未来的、支持学习的网络物理系统
  • 批准号:
    2324937
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Collaborative Research: CISE: Large: Cross-Layer Resilience to Silent Data Corruption
协作研究:CISE:大型:针对静默数据损坏的跨层弹性
  • 批准号:
    2321492
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant
Collaborative Research: CyberTraining: CIP: A Cross-Institutional Research Engagement Network for CI Facilitators
协作研究:网络培训:CIP:CI 促进者的跨机构研究参与网络
  • 批准号:
    2230108
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Collaborative Research: CyberTraining: Implementation: Medium: Cross-Disciplinary Training for Joint Cyber-Physical Systems and IoT Security
协作研究:网络培训:实施:中:联合网络物理系统和物联网安全的跨学科培训
  • 批准号:
    2230086
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了