SPX: Collaborative Research: Pinpointing and Resolving Scalability Culprits Hidden in Different Components of the Whole System Stack
SPX:协作研究:查明并解决隐藏在整个系统堆栈不同组件中的可扩展性问题
基本信息
- 批准号:2024253
- 负责人:
- 金额:$ 41.45万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-11-01 至 2023-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Modern computers leverage multi-core or many-core processors to accelerate parallel applications. Unfortunately, the speedup of these applications is typically far from ideal, due to some hidden scalability issues. Previous research mainly focuses on application code to identify scalability bottlenecks, neglecting the fact that the application code interacts with numerous external components, including memory allocator, third-party runtime libraries, and the operating system. Understanding and fixing scalability problems should hence go beyond application code and consider the whole software stack. The project's novelties are to pinpoint scalability culprits hidden in different components of the whole stack and automatically fix the scalability bottlenecks. The project's impacts are significantly improved performance for applications running on multi-core processors and thus accelerated scientific discoveries and energy saving.This project aims to systematically pinpoint and resolve latent software contention in all components of the whole software stack from user space. The proposed approaches are urgent due to the pervasive use of multi-core and many-core hardware. Also, according to Amdahl's law, a small degree of latent contention in any of the components may substantially limit the speedup potential on these modern hardware. The research plans to design low-overhead profilers to obtain runtime information for system calls, memory allocator behaviors, and all interacting events between components, as well as analyzers to automatically pinpoint the root causes of scalability bottlenecks. Through a runtime optimizer, the research aims to fix the identified scalability issues without intervention from the programmer. The project has potential to dramatically reduce manual effort for software optimization and improve performance for parallel applications on modern hardware.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现代计算机利用多核或多核处理器来加速并行应用程序。不幸的是,由于一些隐藏的可伸缩性问题,这些应用程序的加速比通常并不理想。以前的研究主要集中在应用程序代码以确定可伸缩性瓶颈,而忽略了应用程序代码与许多外部组件交互的事实,包括内存分配器、第三方运行时库和操作系统。因此,理解和修复可伸缩性问题应该超越应用程序代码,并考虑整个软件堆栈。该项目的创新之处在于找出隐藏在整个堆栈的不同组件中的可伸缩性罪魁祸首,并自动修复可伸缩性瓶颈。该项目的影响是显著提高了运行在多核处理器上的应用程序的性能,从而加速了科学发现和节能。该项目旨在从用户空间系统地准确定位和解决整个软件堆栈中所有组件中潜在的软件争用。由于多核和多核硬件的普遍使用,提出的方法迫在眉睫。此外,根据Amdahl定律,任何组件中的少量潜在争用都可能极大地限制这些现代硬件的加速潜力。该研究计划设计低开销的分析器,以获取系统调用、内存分配器行为和组件之间所有交互事件的运行时信息,以及自动查明可伸缩性瓶颈的根本原因的分析器。通过运行时优化器,该研究旨在修复已识别的可伸缩性问题,而无需程序员干预。该项目有可能极大地减少软件优化的人工工作量,并提高现代硬件上并行应用程序的性能。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
WATCHER: in-situ failure diagnosis
- DOI:10.1145/3428211
- 发表时间:2020-11
- 期刊:
- 影响因子:0
- 作者:Hongyu Liu;Sam Silvestro;X. Zhang;Jian Huang;Tongping Liu
- 通讯作者:Hongyu Liu;Sam Silvestro;X. Zhang;Jian Huang;Tongping Liu
Deadlock prediction via generalized dependency
- DOI:10.1145/3533767.3534377
- 发表时间:2022-07
- 期刊:
- 影响因子:0
- 作者:Jinpeng Zhou;Hanmei Yang;J. Lange;Tongping Liu
- 通讯作者:Jinpeng Zhou;Hanmei Yang;J. Lange;Tongping Liu
CachePerf: A Unified Cache Miss Classifier via Hybrid Hardware Sampling
CachePerf:通过混合硬件采样的统一缓存未命中分类器
- DOI:10.1145/3547353.3526954
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Zhou, Jin;Tang, Steven;Yang, Hanmei;Liu, Tongping
- 通讯作者:Liu, Tongping
Prober: Practically Defending Overflows with Page Protection
- DOI:10.1145/3324884.3416533
- 发表时间:2020-09
- 期刊:
- 影响因子:0
- 作者:Hongyu Liu;Ruiqin Tian;Tongping Liu;Bin Ren
- 通讯作者:Hongyu Liu;Ruiqin Tian;Tongping Liu;Bin Ren
NumaPerf: predictive NUMA profiling
NumaPerf:预测 NUMA 分析
- DOI:10.1145/3447818.3460361
- 发表时间:2021
- 期刊:
- 影响因子:0
- 作者:Zhao, Xin;Zhou, Jin;Guan, Hui;Wang, Wei;Liu, Xu;Liu, Tongping
- 通讯作者:Liu, Tongping
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Tongping Liu其他文献
Exploring Performance and Cost Optimization with ASIC-Based CXL Memory
探索基于 ASIC 的 CXL 内存的性能和成本优化
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Yupeng Tang;Ping Zhou;Wenhui Zhang;Henry Hu;Qirui Yang;Hao Xiang;Tongping Liu;Jiaxin Shan;Ruoyun Huang;Cheng Zhao;Cheng Chen;Hui Zhang;Fei Liu;Shuai Zhang;Xiaoning Ding;Jianjun Chen - 通讯作者:
Jianjun Chen
Cheetah: Detecting false sharing efficiently and effectively
Cheetah:高效、有效地检测虚假共享
- DOI:
- 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
Tongping Liu;Xu Liu - 通讯作者:
Xu Liu
Tongping Liu的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Tongping Liu', 18)}}的其他基金
An Educational Tool for Teaching and Learning Concurrent Computer Programming Techniques
用于教授和学习并行计算机编程技术的教育工具
- 批准号:
2215193 - 财政年份:2022
- 资助金额:
$ 41.45万 - 项目类别:
Standard Grant
SPX: Collaborative Research: Pinpointing and Resolving Scalability Culprits Hidden in Different Components of the Whole System Stack
SPX:协作研究:查明并解决隐藏在整个系统堆栈不同组件中的可扩展性问题
- 批准号:
1823004 - 财政年份:2018
- 资助金额:
$ 41.45万 - 项目类别:
Standard Grant
CRII: SHF: EVID: Evidence-Assisted Detection and Elimination of Memory Errors in Single and Multi-threaded Programs
CRII:SHF:EVID:单线程和多线程程序中内存错误的证据辅助检测和消除
- 批准号:
1566154 - 财政年份:2016
- 资助金额:
$ 41.45万 - 项目类别:
Standard Grant
相似海外基金
SPX: Collaborative Research: Automated Synthesis of Extreme-Scale Computing Systems Using Non-Volatile Memory
SPX:协作研究:使用非易失性存储器自动合成超大规模计算系统
- 批准号:
2408925 - 财政年份:2023
- 资助金额:
$ 41.45万 - 项目类别:
Standard Grant
SPX: Collaborative Research: Scalable Neural Network Paradigms to Address Variability in Emerging Device based Platforms for Large Scale Neuromorphic Computing
SPX:协作研究:可扩展神经网络范式,以解决基于新兴设备的大规模神经形态计算平台的可变性
- 批准号:
2401544 - 财政年份:2023
- 资助金额:
$ 41.45万 - 项目类别:
Standard Grant
SPX: Collaborative Research: Intelligent Communication Fabrics to Facilitate Extreme Scale Computing
SPX:协作研究:促进超大规模计算的智能通信结构
- 批准号:
2412182 - 财政年份:2023
- 资助金额:
$ 41.45万 - 项目类别:
Standard Grant
SPX: Collaborative Research: Cross-stack Memory Optimizations for Boosting I/O Performance of Deep Learning HPC Applications
SPX:协作研究:用于提升深度学习 HPC 应用程序 I/O 性能的跨堆栈内存优化
- 批准号:
2318628 - 财政年份:2022
- 资助金额:
$ 41.45万 - 项目类别:
Standard Grant
SPX: Collaborative Research: NG4S: A Next-generation Geo-distributed Scalable Stateful Stream Processing System
SPX:合作研究:NG4S:下一代地理分布式可扩展状态流处理系统
- 批准号:
2202859 - 财政年份:2022
- 资助金额:
$ 41.45万 - 项目类别:
Standard Grant
SPX: Collaborative Research: FASTLEAP: FPGA based compact Deep Learning Platform
SPX:协作研究:FASTLEAP:基于 FPGA 的紧凑型深度学习平台
- 批准号:
2333009 - 财政年份:2022
- 资助金额:
$ 41.45万 - 项目类别:
Standard Grant
SPX: Collaborative Research: Memory Fabric: Data Management for Large-scale Hybrid Memory Systems
SPX:协作研究:内存结构:大规模混合内存系统的数据管理
- 批准号:
2132049 - 财政年份:2021
- 资助金额:
$ 41.45万 - 项目类别:
Standard Grant
SPX: Collaborative Research: Automated Synthesis of Extreme-Scale Computing Systems Using Non-Volatile Memory
SPX:协作研究:使用非易失性存储器自动合成超大规模计算系统
- 批准号:
2113307 - 财政年份:2020
- 资助金额:
$ 41.45万 - 项目类别:
Standard Grant
SPX: Collaborative Research: FASTLEAP: FPGA based compact Deep Learning Platform
SPX:协作研究:FASTLEAP:基于 FPGA 的紧凑型深度学习平台
- 批准号:
1919117 - 财政年份:2019
- 资助金额:
$ 41.45万 - 项目类别:
Standard Grant
SPX: Collaborative Research: Intelligent Communication Fabrics to Facilitate Extreme Scale Computing
SPX:协作研究:促进超大规模计算的智能通信结构
- 批准号:
1918987 - 财政年份:2019
- 资助金额:
$ 41.45万 - 项目类别:
Standard Grant