CSR-PSCE, SM: Collaborative Research: VOLPEX: A Framework for Parallel Execution on Volatile Nodes

CSR-PSCE、SM:协作研究:VOLPEX:易失性节点上并行执行的框架

基本信息

  • 批准号:
    0834750
  • 负责人:
  • 金额:
    $ 28万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2008
  • 资助国家:
    美国
  • 起止时间:
    2008-09-01 至 2012-08-31
  • 项目状态:
    已结题

项目摘要

Ordinary PCs are widely employed for large scale scientific computing today. Released in 2004, BOINC middleware is managing over 500,000 volunteered PC nodes and providing computation power to around 30 scientific research projects including Rosetta@home, Climateprediction.net, and IBM World Community Grid. The main attraction of such "virtual clusters" is that computing is "free" as minimal additional hardware and personnel resources are needed. The potential for exploiting such idle cycles is immense since well under 1% of the world's 1 billion PCs are currently participating.Currently idle PCs are exploited for sequential and independent parallel tasks, but not communicating parallel tasks, a common HPC paradigm. The central goal of this project is to achieve robust execution of communicating parallel applications on networked ordinary PCs. The challenge is that ordinary desktops are "volatile", i.e., their availability changes suddenly and frequently based on desktop owner's actions. Checkpointing of parallel applications, the state of the art in fault tolerant scientific computing, is not sufficient for high failure rate environments.This project is developing the VolPEx framework (Parallel Execution on Volatile nodes) that employs managed redundancy as the core mechanism to achieve seamless forward application progress in the presence of routine failures. The canonical execution model consists of two or more concurrent replicas of each process with failed process replicas regenerated on-demand from healthy ones. The following communication APIs are provided for application development.1. Virtual Dataspace: An abstract API for asynchronous anonymous Put/Get communication among tasks. The BOINC programming model of independent tasks is being extended with the dataspace API to allow inter-task communication 2. Volpex MPI: A subset implementation of MPI with a communication layer customized for execution on volatile nodes. The validation of this research will include execution of selected parallel applications on 100s of nodes across campus LANs and 1000s of nodes across the globe under Volpex/BOINC.The ability to transform ordinary PCs into a virtual cluster to run parallel codes will have wide ranging impact. Virtually all scientists will get access to HPC while the need for clusters and the cost of purchasing, operating, and maintaining clusters will diminish.
如今,普通个人电脑被广泛用于大规模的科学计算。BOINC中间件于2004年发布,目前管理着超过50万个志愿PC节点,并为大约30个科学研究项目提供计算能力,包括Rosetta@home、Climateprediction.net和IBM World Community Grid。这种“虚拟集群”的主要吸引力在于计算是“免费的”,因为只需要很少的额外硬件和人力资源。利用这种空闲周期的潜力是巨大的,因为目前世界上10亿台个人电脑中只有不到1%的人参与其中。当前空闲的pc被用于顺序和独立的并行任务,而不是通信并行任务,这是一种常见的HPC范式。该项目的中心目标是在联网的普通pc上实现通信并行应用程序的健壮执行。挑战在于普通桌面是“不稳定的”,也就是说,它们的可用性会根据桌面所有者的操作而突然而频繁地变化。并行应用程序的检查点是容错科学计算的最新技术,对于高故障率的环境是不够的。该项目正在开发VolPEx框架(Volatile node上的并行执行),该框架采用管理冗余作为核心机制,在存在常规故障的情况下实现无缝的向前应用程序进度。规范执行模型由每个流程的两个或多个并发副本组成,其中失败的流程副本根据需要从健康的流程副本中重新生成。为应用程序开发提供了以下通信api。虚拟数据空间:用于任务间异步匿名Put/Get通信的抽象API。独立任务的BOINC编程模型正在用数据空间API进行扩展,以允许任务间通信2。Volpex MPI: MPI的子集实现,具有为在易变节点上执行而定制的通信层。这项研究的验证将包括在Volpex/BOINC下在校园局域网的100个节点和全球1000个节点上执行选定的并行应用程序。将普通pc转换成虚拟集群以运行并行代码的能力将产生广泛的影响。几乎所有的科学家都可以使用高性能计算,而对集群的需求以及购买、操作和维护集群的成本将会减少。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Jaspal Subhlok其他文献

A Robust and Efficient Message Passing Library for Volunteer Computing Environments
  • DOI:
    10.1007/s10723-010-9172-x
  • 发表时间:
    2010-11-18
  • 期刊:
  • 影响因子:
    2.900
  • 作者:
    Rakhi Anand;Troy LeBlanc;Edgar Gabriel;Jaspal Subhlok
  • 通讯作者:
    Jaspal Subhlok
General-purpose blade infrastructure for configurable system architectures
  • DOI:
    10.1007/s10619-007-7016-x
  • 发表时间:
    2007-09-06
  • 期刊:
  • 影响因子:
    0.900
  • 作者:
    Kevin Leigh;Parthasarathy Ranganathan;Jaspal Subhlok
  • 通讯作者:
    Jaspal Subhlok

Jaspal Subhlok的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Jaspal Subhlok', 18)}}的其他基金

Collaborative Research: Tablet PC-Based Indexed Captioned Searchable Videos for STEM Coursework
协作研究:用于 STEM 课程的基于平板电脑的索引字幕可搜索视频
  • 批准号:
    0817558
  • 财政年份:
    2008
  • 资助金额:
    $ 28万
  • 项目类别:
    Standard Grant
Automatic Construction of Performance Skeletons for Grid Resource Selection
自动构建网格资源选择的性能骨架
  • 批准号:
    0410797
  • 财政年份:
    2004
  • 资助金额:
    $ 28万
  • 项目类别:
    Continuing Grant
SOFTWARE: Automatic Resource Selection in Dynamic Networked Computation Environments
软件:动态网络计算环境中的自动资源选择
  • 批准号:
    0234328
  • 财政年份:
    2003
  • 资助金额:
    $ 28万
  • 项目类别:
    Continuing Grant

相似海外基金

CSR-PSCE, SM: MPI-PPA: Improving Efficiency of Large-Scale Clusters Through Statistical Performance Prediction
CSR-PSCE、SM:MPI-PPA:通过统计性能预测提高大规模集群的效率
  • 批准号:
    0936251
  • 财政年份:
    2009
  • 资助金额:
    $ 28万
  • 项目类别:
    Continuing Grant
Collaborative Research: CSR-PSCE, SM: Adaptive Memory Management in Shared Environments
合作研究:CSR-PSCE、SM:共享环境中的自适应内存管理
  • 批准号:
    0834323
  • 财政年份:
    2008
  • 资助金额:
    $ 28万
  • 项目类别:
    Continuing Grant
CSR-PSCE,SM: Trade-offs Between Static Power, Performance and Reliability in Future Chip Multiprocessors
CSR-PSCE,SM:未来芯片多处理器静态功耗、性能和可靠性之间的权衡
  • 批准号:
    0834799
  • 财政年份:
    2008
  • 资助金额:
    $ 28万
  • 项目类别:
    Standard Grant
CSR-PSCE,SM: Recovery Aware Parallel Computing
CSR-PSCE,SM:恢复感知并行计算
  • 批准号:
    0834514
  • 财政年份:
    2008
  • 资助金额:
    $ 28万
  • 项目类别:
    Continuing Grant
CSR-PSCE,SM: A Holistic Design Approach to Reliability Using 3D Stacked
CSR-PSCE,SM:使用 3D 堆叠的可靠性整体设计方法
  • 批准号:
    0834798
  • 财政年份:
    2008
  • 资助金额:
    $ 28万
  • 项目类别:
    Standard Grant
CSR-PSCE, SM: Automatic Multithreaded and Transactional Memory Workload Synthesis for Efficient Multi-core Design Space Evaluation
CSR-PSCE、SM:自动多线程和事务性内存工作负载合成,用于高效的多核设计空间评估
  • 批准号:
    0834288
  • 财政年份:
    2008
  • 资助金额:
    $ 28万
  • 项目类别:
    Standard Grant
Collaborative Research: CSR-PSCE, SM: Memory Thermal Management for Multi-Core Systems
合作研究:CSR-PSCE、SM:多核系统的内存热管理
  • 批准号:
    0834475
  • 财政年份:
    2008
  • 资助金额:
    $ 28万
  • 项目类别:
    Standard Grant
CSR-PSCE, SM: Memory Management Innovations for Next-Generation SMP
CSR-PSCE、SM:下一代 SMP 的内存管理创新
  • 批准号:
    0834619
  • 财政年份:
    2008
  • 资助金额:
    $ 28万
  • 项目类别:
    Continuing Grant
CSR-PSCE,SM: Compiler-Directed System Optimization of a Highly-Parallel Fine-Grained Chip Multiprocessor
CSR-PSCE,SM:高度并行细粒度芯片多处理器的编译器导向系统优化
  • 批准号:
    0834373
  • 财政年份:
    2008
  • 资助金额:
    $ 28万
  • 项目类别:
    Continuing Grant
Collaborative Research: CSR-PSCE, SM: Memory Thermal Management for Multi-Core Systems
合作研究:CSR-PSCE、SM:多核系统的内存热管理
  • 批准号:
    0834469
  • 财政年份:
    2008
  • 资助金额:
    $ 28万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了