PPoSS: Planning: CP2: Towards Systems Correctness Checkability and Performance Predictability at Scale

PPoSS:规划:CP2:实现大规模系统正确性可检查性和性能可预测性

基本信息

  • 批准号:
    2028427
  • 负责人:
  • 金额:
    $ 24.8万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-09-01 至 2022-08-31
  • 项目状态:
    已结题

项目摘要

As a critical backend for many of today's applications and services, large-scale distributed systems must be highly reliable. In the last couple of years the field witnessed a phenomenal scale of deployment; Google is known to run clusters with thousands of machines each, Apple deploys over 100,000 database machines, and Netflix runs tens of database clusters with 500 nodes each. This new era of cloud-scale distributed systems has given birth to a new class of faults, scalability faults---faults whose symptoms surface in large-scale deployments but not necessarily in small/medium-scale deployments. The CP2 project is proposed to solve the problem of correctness checkability and performance predictability of systems at extreme scale. Specifically the project will analyze over 500 real-world scalability faults in over a dozen large-scale systems, develop a single-machine scale-checking framework that allows developers to test large distributed code on one or a few machines, and provide groundwork for compute- and I/O-performance predictability of large-scale jobs on both existing and future architectures. These tasks will advance debugging, testing, learning, and prediction methods both on traditional hardware platforms and emerging ones and ultimately lead to correct-by-construction development methods. The CP2 project will have impact in multiple disciplines including systems (cloud/datacenter systems reliability), programming languages/compilers (new static/dynamic analysis techniques), architecture (compute/storage prediction for heterogeneous hardware), algorithms (the use of learning methods), and high-performance computing (benchmarking of HPC systems/applications).In terms of societal benefits, the CP2 project addresses paramount issues mentioned in the NSF Strategic Plan for 2018-2022. More specifically, society increasingly depends on complicated systems that are products of human ingenuity, including ecosystems of large and complex software with millions of lines of code running on thousands of machines. CP2 will address the challenges of understanding and predicting the behavior of such systems. Furthermore, as society’s reliance on complex systems grows, learning about their robustness and understanding how to strengthen them are of increasing importance. In terms of education, the CP2 project gives unique hands-on research and education with cutting-edge systems technology in which students will be trained to operate software on a large number of machines and analyze their performance and correctness. The results of the CP2 project will be released through the classic medium of publication, through the development of numerous software artifacts which will be open-sourced, and finally through collaboration with various industry partners to help shape the next generation of large-scale systems.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
作为当今许多应用程序和服务的关键后端,大规模分布式系统必须具有高度可靠性。 在过去的几年里,该领域见证了惊人的部署规模;众所周知,Google运行的集群每个有数千台机器,Apple部署了超过10万台数据库机器,Netflix运行了数十个数据库集群,每个集群有500个节点。 这个云规模分布式系统的新时代催生了一类新的故障,可扩展性故障-故障的症状出现在大规模部署中,但不一定出现在中小规模部署中。 CP 2项目的提出是为了解决极端规模下系统的正确性可检查性和性能可预测性问题。 具体来说,该项目将分析十几个大型系统中的500多个实际可扩展性故障,开发一个单机规模检查框架,允许开发人员在一台或几台机器上测试大型分布式代码,并为现有和未来架构上大规模作业的计算和I/O性能可预测性提供基础。 这些任务将推进传统硬件平台和新兴硬件平台上的调试、测试、学习和预测方法,并最终导致构建正确的开发方法。CP 2项目将对多个学科产生影响,包括系统(云/数据中心系统可靠性),编程语言/编译器(新的静态/动态分析技术),架构(异构硬件的计算/存储预测)、算法(学习方法的使用)和高性能计算(HPC系统/应用的基准测试)。在社会效益方面,CP 2项目解决了NSF 2018-2022年战略计划中提到的首要问题。 更具体地说,社会越来越依赖于作为人类创造力产物的复杂系统,包括大型复杂软件的生态系统,其中数百万行代码在数千台机器上运行。 CP 2将解决理解和预测此类系统行为的挑战。 此外,随着社会对复杂系统的依赖性不断增加,了解其鲁棒性并了解如何加强它们变得越来越重要。 在教育方面,CP 2项目提供独特的实践研究和教育,采用尖端的系统技术,学生将接受培训,在大量机器上操作软件,并分析其性能和正确性。 CP 2项目的结果将通过传统的出版媒介发布,通过开发众多开源软件工件,最后,通过与各种行业合作伙伴的合作,帮助塑造下一代大型该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响进行评估,被认为值得支持审查标准。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Haryadi Gunawi其他文献

Haryadi Gunawi的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Haryadi Gunawi', 18)}}的其他基金

Collaborative Research: PPoSS: LARGE: ScaleStuds: Foundations for Correctness Checkability and Performance Predictability of Systems at Scale
合作研究:PPoSS:大型:ScaleStuds:大规模系统正确性可检查性和性能可预测性的基础
  • 批准号:
    2119184
  • 财政年份:
    2021
  • 资助金额:
    $ 24.8万
  • 项目类别:
    Continuing Grant
USENIX FAST 2017 NSF Student Travel Support
USENIX FAST 2017 NSF 学生旅行支持
  • 批准号:
    1727380
  • 财政年份:
    2017
  • 资助金额:
    $ 24.8万
  • 项目类别:
    Standard Grant
CSR: Medium:Combating Distributed Concurrency Bugs in Cloud Systems
CSR:中:对抗云系统中的分布式并发错误
  • 批准号:
    1563956
  • 财政年份:
    2016
  • 资助金额:
    $ 24.8万
  • 项目类别:
    Continuing Grant
CSR: Small: BreezeFS: File System Transformation for Cloud and Multistore Era
CSR:小型:BreezeFS:云和多存储时代的文件系统转型
  • 批准号:
    1526304
  • 财政年份:
    2015
  • 资助金额:
    $ 24.8万
  • 项目类别:
    Standard Grant
CAREER: DrCloud: Drill-Ready Cloud Computing
职业:DrCloud:可练习的云计算
  • 批准号:
    1350499
  • 财政年份:
    2014
  • 资助金额:
    $ 24.8万
  • 项目类别:
    Continuing Grant
XPS:CLCCA:LigHTS: Lagging-Hardware Tolerant Systems" in the system.
系统中的“XPS:CLCCA:LigHTS:滞后硬件容忍系统”。
  • 批准号:
    1336580
  • 财政年份:
    2013
  • 资助金额:
    $ 24.8万
  • 项目类别:
    Standard Grant
DC: Small: Collaborative Research: DARE: Declarative and Scalable Recovery
DC:小型:协作研究:DARE:声明式和可扩展的恢复
  • 批准号:
    1321958
  • 财政年份:
    2012
  • 资助金额:
    $ 24.8万
  • 项目类别:
    Standard Grant
DC: Small: Collaborative Research: DARE: Declarative and Scalable Recovery
DC:小型:协作研究:DARE:声明式和可扩展的恢复
  • 批准号:
    1016924
  • 财政年份:
    2010
  • 资助金额:
    $ 24.8万
  • 项目类别:
    Standard Grant

相似海外基金

HoloSurge: Multimodal 3D Holographic tool and real-time Guidance System with point-of-care diagnostics for surgical planning and interventions on liver and pancreatic cancers
HoloSurge:多模态 3D 全息工具和实时指导系统,具有护理点诊断功能,可用于肝癌和胰腺癌的手术规划和干预
  • 批准号:
    10103131
  • 财政年份:
    2024
  • 资助金额:
    $ 24.8万
  • 项目类别:
    EU-Funded
Planning Grant: Developing capacity to attract diverse students to the geosciences: A public relations framework
规划补助金:培养吸引多元化学生学习地球科学的能力:公共关系框架
  • 批准号:
    2326816
  • 财政年份:
    2024
  • 资助金额:
    $ 24.8万
  • 项目类别:
    Standard Grant
Planning: FIRE-PLAN: Building Wildland Fire Science Capacity in Alaska Through The University of Alaska Fairbanks Rural Campuses
规划:FIRE-PLAN:通过阿拉斯加大学费尔班克斯乡村校区建设阿拉斯加荒地火灾科学能力
  • 批准号:
    2333423
  • 财政年份:
    2024
  • 资助金额:
    $ 24.8万
  • 项目类别:
    Standard Grant
Collaborative Research: Planning: FIRE-PLAN:High-Spatiotemporal-Resolution Sensing and Digital Twin to Advance Wildland Fire Science
合作研究:规划:FIRE-PLAN:高时空分辨率传感和数字孪生,以推进荒地火灾科学
  • 批准号:
    2335568
  • 财政年份:
    2024
  • 资助金额:
    $ 24.8万
  • 项目类别:
    Standard Grant
Collaborative Research: Planning: FIRE-PLAN:High-Spatiotemporal-Resolution Sensing and Digital Twin to Advance Wildland Fire Science
合作研究:规划:FIRE-PLAN:高时空分辨率传感和数字孪生,以推进荒地火灾科学
  • 批准号:
    2335569
  • 财政年份:
    2024
  • 资助金额:
    $ 24.8万
  • 项目类别:
    Standard Grant
Planning: FIRE-PLAN: Exploring fire as medicine to revitalize cultural burning in the Upper Midwest
规划:FIRE-PLAN:探索火作为药物,以振兴中西部北部的文化燃烧
  • 批准号:
    2349282
  • 财政年份:
    2024
  • 资助金额:
    $ 24.8万
  • 项目类别:
    Standard Grant
CC* Planning: Strengthening Central Michigan University's Cyberinfrastructure
CC* 规划:加强中央密歇根大学的网络基础设施
  • 批准号:
    2345749
  • 财政年份:
    2024
  • 资助金额:
    $ 24.8万
  • 项目类别:
    Standard Grant
CAREER: Statistical Power Analysis and Optimal Sample Size Planning for Longitudinal Studies in STEM Education
职业:STEM 教育纵向研究的统计功效分析和最佳样本量规划
  • 批准号:
    2339353
  • 财政年份:
    2024
  • 资助金额:
    $ 24.8万
  • 项目类别:
    Continuing Grant
Planning: Advancing Discovery on a Sustainable National Research Enterprise
规划:推进可持续国家研究企业的发现
  • 批准号:
    2412406
  • 财政年份:
    2024
  • 资助金额:
    $ 24.8万
  • 项目类别:
    Standard Grant
Planning: Artificial Intelligence Assisted High-Performance Parallel Computing for Power System Optimization
规划:人工智能辅助高性能并行计算电力系统优化
  • 批准号:
    2414141
  • 财政年份:
    2024
  • 资助金额:
    $ 24.8万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了