Collaborative Research: PPoSS: LARGE: ScaleStuds: Foundations for Correctness Checkability and Performance Predictability of Systems at Scale
合作研究:PPoSS:大型:ScaleStuds:大规模系统正确性可检查性和性能可预测性的基础
基本信息
- 批准号:2118745
- 负责人:
- 金额:$ 62.47万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-10-01 至 2026-09-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
In light of the limits of Moore's Law and Dennard scaling and the ever increasing computing demand, the last decade has seen unprecedented deployment scales; Google is known to run clusters with thousands of machines each, Apple deploys a total of 100,000 database machines, and Netflix runs tens of database clusters with 500 nodes each. This era of extreme-scale distributed systems has given birth to a new class of faults, "scalability faults" -- complex latent faults that are scale-dependent, whose symptoms surface in large-scale deployments but not necessarily in small/medium-scale deployments. Many fundamental research questions are not answerable today. On correctness: How to detect bugs that only manifest under large scale through program analysis? How to test and reproduce various dimensions of system scales efficiently on one machine? How to prevent and fix scalability-related faults? On performance: How to reason about software performance on various heterogeneous devices? How to accurately predict performance of fine-grained tasks to reduce inaccuracies at the aggregate level and project performance to future architectures? Finally, in combination: How to answer all these questions for the larger connected ecosystem -- not just the individual software and hardware components -- and to eventually build future-generation systems that are reproducible and verifiable by construction with respect to correctness and performance at scale? The ScaleStuds project involves a team of ten researchers to develop the foundations of correctness checkability (CC) and performance predictability (PP) of systems at scale. The key principle of this project is to "check large with large" -- check large-scale systems with a large fleet of data, analysis, tests, learning, models, and proofs. The vision is to build an ecosystem of distributed "CC+PP-certified" software-software and -hardware interactions. The project is paving the vision one "floor" at a time, creating composable building blocks ("the studs"). The project first builds new mechanisms such as a scale-testing platform and a unified database of software program properties and hardware performance profiles exposing clear APIs. These studs then enable multi-dimensional automated scalability tests and program analysis and performance learning and prediction at various levels of the software/hardware stack. Ultimately all of these experiences are intended to lead to correct and performant cross-layer/service interactions and future design principles including reproducible- and verified-by-construction development methods. The project novelties include the advancement of debugging, testing, learning, and prediction methods to ensure correctness checkability and performance predictability of extreme-scale systems and applications both on classical hardware platforms and emerging ones; a unified data ecosystem of software/hardware properties and profiles that facilitates automated analyses via clear APIs; a multi-dimensional scale-testing framework that empowers the development of new large-scale unit-tests and program analysis; detailed device profiling and observation to enable large-scale performance learning/prediction and deliver lessons for learning/predicting the behavior of other devices and layers in an end-to-end hardware/software stack; and ultimately a clear definition of CC+PP-certifiability for today's systems and future verifiable/reproducible-by-construction development methods.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
考虑到摩尔定律和Dennard扩展的限制以及不断增长的计算需求,过去十年的部署规模达到了前所未有的水平;众所周知,Google运行的集群每个有数千台机器,Apple部署了总共100,000台数据库机器,Netflix运行了数十个数据库集群,每个集群有500个节点。 这个极端规模分布式系统的时代已经产生了一类新的故障,“可扩展性故障”-复杂的潜在故障,是规模相关的,其症状出现在大规模部署,但不一定在小型/中型部署。 许多基本的研究问题在今天是无法回答的。 关于正确性:如何通过程序分析来检测只有在大规模下才会出现的错误?如何在一台机器上有效地测试和再现系统秤的各种尺寸?如何预防和修复与可扩展性相关的故障? 关于性能:如何在各种异构设备上推理软件性能?如何准确地预测细粒度任务的性能,以减少聚合级别的不准确性,并将性能投射到未来的架构中? 最后,结合起来:如何为更大的互联生态系统(而不仅仅是单个软件和硬件组件)回答所有这些问题,并最终构建出可复制和可验证的未来一代系统?ScaleStuds项目涉及一个由十名研究人员组成的团队,旨在开发大规模系统的正确性可检查性(CC)和性能可预测性(PP)的基础。这个项目的关键原则是“以大查大”--用大量的数据、分析、测试、学习、模型和证明来检查大规模系统。 我们的愿景是建立一个分布式的“CC+ PP认证”的软件-软件和硬件交互的生态系统。 这个项目是一次铺一层“地板”,创造可组合的积木(“立柱”)。 该项目首先建立了新的机制,例如规模测试平台和软件程序属性和硬件性能配置文件的统一数据库,从而公开明确的API。 然后,这些研究能够在软件/硬件堆栈的各个级别上进行多维自动可扩展性测试和程序分析以及性能学习和预测。 最终,所有这些经验都旨在导致正确和高性能的跨层/服务交互和未来的设计原则,包括可复制和可验证的施工开发方法。 该项目的创新包括调试、测试、学习和预测方法的进步,以确保经典硬件平台和新兴平台上的极端规模系统和应用程序的正确性可检查性和性能可预测性;软件/硬件属性和配置文件的统一数据生态系统,通过清晰的API促进自动化分析;一个多维规模测试框架,使新的大规模单元测试和程序分析的发展;详细的设备分析和观察,以实现大规模的性能学习/预测,并提供学习/预测端到端硬件/软件栈中的其他设备和层的行为;并最终明确定义CC+ PP认证,适用于当今的系统和未来的可验证/可重现系统,该奖项反映了NSF的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准。
项目成果
期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
IsoBugView: Interactively Debugging Isolation Bugs in Database Applications
- DOI:10.14778/3554821.3554885
- 发表时间:2022-08
- 期刊:
- 影响因子:0
- 作者:Drew Ripberger;Yifan Gan;Xueyuan Ren;Spyros Blanas;Yang Wang
- 通讯作者:Drew Ripberger;Yifan Gan;Xueyuan Ren;Spyros Blanas;Yang Wang
Developer's Responsibility or Database's Responsibility? Rethinking Concurrency Control in Databases
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Chao-Wei Cheng;Mingzhe Han;Nuo Xu;Spyros Blanas;Michael D. Bond;Yang Wang
- 通讯作者:Chao-Wei Cheng;Mingzhe Han;Nuo Xu;Spyros Blanas;Michael D. Bond;Yang Wang
On the Discontinuation of Persistent Memory: Looking Back to Look Forward
论持久内存的终止:回顾过去展望未来
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Li, Tianxi;Wang, Yang;Lu, Xiaoyi
- 通讯作者:Lu, Xiaoyi
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Yang Wang其他文献
A New DDSCR structure with high holding voltage for robust ESD applications
具有高保持电压的新型 DDSCR 结构,适用于稳健的 ESD 应用
- DOI:
10.1088/1674-1056/abd38f - 发表时间:
2021 - 期刊:
- 影响因子:1.7
- 作者:
Zi-Jie Zhou;Xiang-Liang Jin;Yang Wang;Peng Dong - 通讯作者:
Peng Dong
Foresee Urban Sparse Traffic Accidents: A Spatiotemporal Multi-Granularity Perspective
预见城市稀疏交通事故:时空多粒度视角
- DOI:
10.1109/tkde.2020.3034312 - 发表时间:
2022-08 - 期刊:
- 影响因子:0
- 作者:
Zhengyang Zhou;Yang Wang;Xike Xie;Lianliang Chen;Chaochao Zhu - 通讯作者:
Chaochao Zhu
Correlation of choroidal thickness with age in healthy subjects: automatic detection and segmentation using a deep learning model
健康受试者脉络膜厚度与年龄的相关性:使用深度学习模型自动检测和分割
- DOI:
10.1007/s10792-022-02292-8 - 发表时间:
2021 - 期刊:
- 影响因子:1.6
- 作者:
Chengshan Lin;Yu Huang;W. Hsia;Yang Wang;Chia - 通讯作者:
Chia
Structure-activity relationships OF N-methylthiolated beta-lactam antibiotics with C3 substitutions and their selective induction of apoptosis in human cancer cells.
具有 C3 取代的 N-甲硫基 β-内酰胺抗生素的构效关系及其对人类癌细胞凋亡的选择性诱导。
- DOI:
10.2741/1611 - 发表时间:
2005 - 期刊:
- 影响因子:0
- 作者:
D. Kuhn;Yang Wang;V. Minić;Cristina M. Coates;G. Reddy;K. Daniel;J. Shim;Di Chen;K. Landis;F. Miller;E. Turos;Q. Dou - 通讯作者:
Q. Dou
FCA assisted IF Channel Construction towards Formulating Conceptual Data Modeling
FCA 协助中频通道建设制定概念数据模型
- DOI:
- 发表时间:
2006 - 期刊:
- 影响因子:0
- 作者:
Yang Wang;Yang Wang - 通讯作者:
Yang Wang
Yang Wang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Yang Wang', 18)}}的其他基金
Characterizing the Physical, Chemical, and Toxicological Properties of Secondhand Aerosols Generated from Electronic Nicotine Delivery Systems in Indoor Environments
表征室内环境中电子尼古丁传输系统产生的二手气溶胶的物理、化学和毒理学特性
- 批准号:
2324142 - 财政年份:2023
- 资助金额:
$ 62.47万 - 项目类别:
Standard Grant
Frequency-Domain Model Updating through Branch and Bound with Convex Relaxation
通过凸松弛的分支定界更新频域模型
- 批准号:
2211343 - 财政年份:2023
- 资助金额:
$ 62.47万 - 项目类别:
Standard Grant
Characterizing the Physical, Chemical, and Toxicological Properties of Secondhand Aerosols Generated from Electronic Nicotine Delivery Systems in Indoor Environments
表征室内环境中电子尼古丁传输系统产生的二手气溶胶的物理、化学和毒理学特性
- 批准号:
2204659 - 财政年份:2022
- 资助金额:
$ 62.47万 - 项目类别:
Standard Grant
Collaborative Research: SaTC: CORE: Medium: Novel Algorithms and Tools for Empowering People Who Are Blind to Safeguard Private Visual Content
协作研究:SaTC:核心:媒介:帮助盲人保护私人视觉内容的新颖算法和工具
- 批准号:
2126314 - 财政年份:2021
- 资助金额:
$ 62.47万 - 项目类别:
Standard Grant
Collaborative Research: EAGER: SaTC-EDU: Teaching High School Students about Cybersecurity and Artificial Intelligence Ethics via Empathy-Driven Hands-On Projects
合作研究:EAGER:SaTC-EDU:通过同理心驱动的实践项目向高中生传授网络安全和人工智能伦理知识
- 批准号:
2114991 - 财政年份:2021
- 资助金额:
$ 62.47万 - 项目类别:
Standard Grant
Collaborative Research: Gateway to North America--the Great American Biotic Interchange (GABI) in Mexico and Origin of C4 Grassland
合作研究:北美门户——墨西哥大美洲生物交汇处(GABI)与C4草原起源
- 批准号:
1949814 - 财政年份:2020
- 资助金额:
$ 62.47万 - 项目类别:
Standard Grant
Collaborative Research: Element: Development of MuST, A Multiple Scattering Theory based Computational Software for First Principles Approach to Disordered Materials
合作研究:元素:MuST 的开发,一种基于多重散射理论的计算软件,用于无序材料的第一原理方法
- 批准号:
1931525 - 财政年份:2019
- 资助金额:
$ 62.47万 - 项目类别:
Standard Grant
CAREER: Inclusive Privacy: Effective Privacy Management for People with Visual Impairments
职业:包容性隐私:针对视力障碍人士的有效隐私管理
- 批准号:
2028387 - 财政年份:2019
- 资助金额:
$ 62.47万 - 项目类别:
Continuing Grant
CNS Core: SMALL: Clarifying Experimenter Bias by Identifying and Visualizing Experiment Bottlenecks
CNS 核心:SMALL:通过识别和可视化实验瓶颈来澄清实验者偏见
- 批准号:
1908020 - 财政年份:2019
- 资助金额:
$ 62.47万 - 项目类别:
Standard Grant
CAREER: Inclusive Privacy: Effective Privacy Management for People with Visual Impairments
职业:包容性隐私:针对视力障碍人士的有效隐私管理
- 批准号:
1652497 - 财政年份:2017
- 资助金额:
$ 62.47万 - 项目类别:
Continuing Grant
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
Cell Research
- 批准号:31224802
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research
- 批准号:31024804
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research (细胞研究)
- 批准号:30824808
- 批准年份:2008
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
- 批准号:10774081
- 批准年份:2007
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: PPoSS: Large: A Full-stack Approach to Declarative Analytics at Scale
协作研究:PPoSS:大型:大规模声明性分析的全栈方法
- 批准号:
2316161 - 财政年份:2023
- 资助金额:
$ 62.47万 - 项目类别:
Continuing Grant
Collaborative Research: PPoSS: LARGE: Research into the Use and iNtegration of Data Movement Accelerators (RUN-DMX)
协作研究:PPoSS:大型:数据移动加速器 (RUN-DMX) 的使用和集成研究
- 批准号:
2316176 - 财政年份:2023
- 资助金额:
$ 62.47万 - 项目类别:
Continuing Grant
Collaborative Research: PPoSS: Large: A Full-stack Approach to Declarative Analytics at Scale
协作研究:PPoSS:大型:大规模声明性分析的全栈方法
- 批准号:
2316158 - 财政年份:2023
- 资助金额:
$ 62.47万 - 项目类别:
Continuing Grant
Collaborative Research: PPoSS: LARGE: Cross-layer Coordination and Optimization for Scalable and Sparse Tensor Networks (CROSS)
合作研究:PPoSS:LARGE:可扩展和稀疏张量网络的跨层协调和优化(CROSS)
- 批准号:
2316201 - 财政年份:2023
- 资助金额:
$ 62.47万 - 项目类别:
Standard Grant
Collaborative Research: PPoSS: LARGE: Cross-layer Coordination and Optimization for Scalable and Sparse Tensor Networks (CROSS)
合作研究:PPoSS:LARGE:可扩展和稀疏张量网络的跨层协调和优化(CROSS)
- 批准号:
2316203 - 财政年份:2023
- 资助金额:
$ 62.47万 - 项目类别:
Continuing Grant
Collaborative Research: PPoSS: LARGE: Research into the Use and iNtegration of Data Movement Accelerators (RUN-DMX)
协作研究:PPoSS:大型:数据移动加速器 (RUN-DMX) 的使用和集成研究
- 批准号:
2316177 - 财政年份:2023
- 资助金额:
$ 62.47万 - 项目类别:
Continuing Grant
Collaborative Research: PPoSS: LARGE: Cross-layer Coordination and Optimization for Scalable and Sparse Tensor Networks (CROSS)
合作研究:PPoSS:LARGE:可扩展和稀疏张量网络的跨层协调和优化(CROSS)
- 批准号:
2316202 - 财政年份:2023
- 资助金额:
$ 62.47万 - 项目类别:
Standard Grant
Collaborative Research: PPoSS: LARGE: General-Purpose Scalable Technologies for Fundamental Graph Problems
合作研究:PPoSS:大型:解决基本图问题的通用可扩展技术
- 批准号:
2316235 - 财政年份:2023
- 资助金额:
$ 62.47万 - 项目类别:
Continuing Grant
Collaborative Research: PPoSS: LARGE: Principles and Infrastructure of Extreme Scale Edge Learning for Computational Screening and Surveillance for Health Care
合作研究:PPoSS:大型:用于医疗保健计算筛查和监视的超大规模边缘学习的原理和基础设施
- 批准号:
2406572 - 财政年份:2023
- 资助金额:
$ 62.47万 - 项目类别:
Continuing Grant
Collaborative Research: PPoSS: Large: A Full-stack Approach to Declarative Analytics at Scale
协作研究:PPoSS:大型:大规模声明性分析的全栈方法
- 批准号:
2316159 - 财政年份:2023
- 资助金额:
$ 62.47万 - 项目类别:
Continuing Grant