SPX: Collaborative Research: Cross-stack Memory Optimizations for Boosting I/O Performance of Deep Learning HPC Applications
SPX:协作研究:用于提升深度学习 HPC 应用程序 I/O 性能的跨堆栈内存优化
基本信息
- 批准号:1919075
- 负责人:
- 金额:$ 32.06万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-10-01 至 2023-05-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
New computing applications are emerging in smart networks, scientific explorations, business management, security, and healthcare. These applications depend on very large amounts of data. This data must be used in a fast and efficient manner. The use of large supercomputers to analyze such data is on the rise. The techniques they use are referred to as deep learning (DL) high-performance computing (HPC). Researchers are using DL HPC to make sense of this flood of data and obtain useful information. To do this they must redesign HPC systems. A key challenge is how to use resources such as data storage and computer memory at a huge scale. This project will build Metis, a high-performance data storage system that uses new, end-to-end, hardware-supported memory and storage design to meet the needs of DL HPC applications. The goal is to satisfy the challenge posed by increasing data management performance for next-generation supercomputers. The project will connect several different computing communities and increase interactions among them. The project includes educational and engagement activities which will greatly increase the community's understanding of HPC systems. These activities include broadening participation activities to attract and retain new students. Special emphasis will be given to students from underrepresented groups. The project will encourage student interest in design and research in large-scale computing systems design.This project brings together researchers in micro-architecture, distributed computing systems, namely cloud and HPC systems, storage systems, and power/energy modeling to boost DL HPC data processing performance. The research will yield a fundamentally new software-hardware co-designed memory compression technique that transparently compresses DL application memories with negligible runtime performance overhead. Metis will leverage the novel compression substrate to enable a distributed, intelligent, operating-system-level data cache that effectively exploits the physical memory freed via program-memory compression. The developed techniques will open doors for innovative HPC and scientific applications in a broad range of disciplines, which have not been previously possible. Metis' focus on addressing the challenges of increasing performance in the Exascale era, along with engaging researchers from multiple areas, aligns it very well with the goals and objectives of the SPX program. Additionally, the research will also create new knowledge on design principles of memory compression, and yield insights to provide seamless integration of DL applications into the next-generation DL-aware supercomputer infrastructure.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
新的计算应用正在智能网络、科学探索、商业管理、安全和医疗保健中出现。这些应用程序依赖于非常大量的数据。必须以快速有效的方式使用这些数据。使用大型超级计算机来分析这些数据的情况正在增加。他们使用的技术被称为深度学习(DL)高性能计算(HPC)。研究人员正在使用DL HPC来理解这些数据并获得有用的信息。要做到这一点,他们必须重新设计HPC系统。一个关键的挑战是如何大规模地使用数据存储和计算机内存等资源。该项目将构建Metis,这是一种高性能数据存储系统,使用新的端到端硬件支持的内存和存储设计来满足DL HPC应用程序的需求。其目标是满足下一代超级计算机提高数据管理性能所带来的挑战。该项目将连接几个不同的计算社区,并增加它们之间的互动。该项目包括教育和参与活动,这将大大提高社区对HPC系统的理解。这些活动包括扩大参与活动,以吸引和留住新生。将特别重视代表性不足群体的学生。该项目将鼓励学生对大规模计算系统设计的设计和研究兴趣。该项目汇集了微架构,分布式计算系统,即云和HPC系统,存储系统和功率/能源建模的研究人员,以提高DL HPC数据处理性能。该研究将产生一种全新的软硬件协同设计的内存压缩技术,该技术透明地压缩DL应用程序内存,运行时性能开销可以忽略不计。Metis将利用新的压缩基底来实现分布式、智能、操作系统级的数据缓存,有效地利用通过程序内存压缩释放的物理内存。开发的技术将为创新HPC和广泛学科的科学应用打开大门,这在以前是不可能的。Metis专注于解决Exascale时代提高性能的挑战,沿着来自多个领域的参与研究人员,使其与SPX计划的目标和目的非常一致。此外,该研究还将创造关于内存压缩设计原理的新知识,并产生见解,以提供DL应用程序到下一代DL感知超级计算机基础设施的无缝集成。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(8)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
In Search of a Fast and Efficient Serverless DAG Engine
- DOI:10.1109/pdsw49588.2019.00005
- 发表时间:2019-10
- 期刊:
- 影响因子:0
- 作者:Benjamin Carver;Jingyuan Zhang;Ao Wang;Yue Cheng
- 通讯作者:Benjamin Carver;Jingyuan Zhang;Ao Wang;Yue Cheng
InfiniCache: Exploiting Ephemeral Serverless Functions to Build a Cost-Effective Memory Cache
- DOI:
- 发表时间:2020-01
- 期刊:
- 影响因子:0
- 作者:Ao Wang;Jingyuan Zhang;Xiaolong Ma;Ali Anwar;Lukas Rupprecht;Dimitrios Skourtis;Vasily Tarasov
- 通讯作者:Ao Wang;Jingyuan Zhang;Xiaolong Ma;Ali Anwar;Lukas Rupprecht;Dimitrios Skourtis;Vasily Tarasov
FaaSNet: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute
- DOI:
- 发表时间:2021-05
- 期刊:
- 影响因子:0
- 作者:Ao Wang;Shuai Chang;Huangshi Tian;Hongqi Wang;Haoran Yang;Huiba Li;Rui Du;Yue Cheng
- 通讯作者:Ao Wang;Shuai Chang;Huangshi Tian;Hongqi Wang;Haoran Yang;Huiba Li;Rui Du;Yue Cheng
SFS: Smart OS Scheduling for Serverless Functions
- DOI:10.1109/sc41404.2022.00047
- 发表时间:2022-09
- 期刊:
- 影响因子:0
- 作者:Yuqi Fu;Li Liu;Haoliang Wang;Yue Cheng;Songqing Chen
- 通讯作者:Yuqi Fu;Li Liu;Haoliang Wang;Yue Cheng;Songqing Chen
FedAT: A High-Performance and Communication-Efficient Federated Learning System with Asynchronous Tiers
- DOI:10.1145/3458817.3476211
- 发表时间:2020-10
- 期刊:
- 影响因子:0
- 作者:Zheng Chai;Yujing Chen;Ali Anwar;Liang Zhao;Yue Cheng;H. Rangwala
- 通讯作者:Zheng Chai;Yujing Chen;Ali Anwar;Liang Zhao;Yue Cheng;H. Rangwala
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Yue Cheng其他文献
Calibration of reduced-order model for a coupled Burgers equations based on PC-EnKF
基于PC-EnKF的耦合Burgers方程降阶模型标定
- DOI:
10.1051/mmnp/2018023 - 发表时间:
2018 - 期刊:
- 影响因子:2.2
- 作者:
Yuepeng Wang;Yue Cheng;Zongyuan Zhang;Guang Lin - 通讯作者:
Guang Lin
Luminescence properties of Tm3+ in phosphate glasses sensitized by Sb3+ co-doping
Sb3共掺杂敏化磷酸盐玻璃中Tm3的发光特性
- DOI:
- 发表时间:
2019 - 期刊:
- 影响因子:3.6
- 作者:
Yang Wang;Jin Xie;Yue Cheng;Ziwei Zhao;Xianming Zhao;Guorong Chen - 通讯作者:
Guorong Chen
CpG island methylator phenotype association with elevated serum alpha-fetoprotein level in hepatocellular carcinoma.
CpG 岛甲基化表型与肝细胞癌血清甲胎蛋白水平升高相关。
- DOI:
- 发表时间:
2007 - 期刊:
- 影响因子:11.5
- 作者:
Changsong Zhang;Zhengyou Li;Yue Cheng;Fengqi Jia;Rong Li;Mengchao Wu;Ke Li;Lixin Wei - 通讯作者:
Lixin Wei
Company capital structure and tax : a study of mid-sized European companies
公司资本结构和税收:对欧洲中型企业的研究
- DOI:
- 发表时间:
2008 - 期刊:
- 影响因子:0
- 作者:
Yue Cheng - 通讯作者:
Yue Cheng
Chromosome 13q12 region critical for the viability and growth of nasopharyngeal carcinoma hybrids
染色体 13q12 区域对于鼻咽癌杂交体的生存和生长至关重要
- DOI:
- 发表时间:
2004 - 期刊:
- 影响因子:6.4
- 作者:
Yue Cheng;H. Lung;P. S. Wong;Da Cheng Hao;Chim Shek Man;E. Stanbridge;M. Lung - 通讯作者:
M. Lung
Yue Cheng的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Yue Cheng', 18)}}的其他基金
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
- 批准号:
2403313 - 财政年份:2024
- 资助金额:
$ 32.06万 - 项目类别:
Standard Grant
SPX: Collaborative Research: Cross-stack Memory Optimizations for Boosting I/O Performance of Deep Learning HPC Applications
SPX:协作研究:用于提升深度学习 HPC 应用程序 I/O 性能的跨堆栈内存优化
- 批准号:
2318628 - 财政年份:2022
- 资助金额:
$ 32.06万 - 项目类别:
Standard Grant
CAREER: Harnessing Serverless Functions to Build Highly Elastic Cloud Storage Infrastructure
职业:利用无服务器功能构建高弹性的云存储基础设施
- 批准号:
2322860 - 财政年份:2022
- 资助金额:
$ 32.06万 - 项目类别:
Continuing Grant
CAREER: Harnessing Serverless Functions to Build Highly Elastic Cloud Storage Infrastructure
职业:利用无服务器功能构建高弹性的云存储基础设施
- 批准号:
2045680 - 财政年份:2021
- 资助金额:
$ 32.06万 - 项目类别:
Continuing Grant
相似海外基金
SPX: Collaborative Research: Automated Synthesis of Extreme-Scale Computing Systems Using Non-Volatile Memory
SPX:协作研究:使用非易失性存储器自动合成超大规模计算系统
- 批准号:
2408925 - 财政年份:2023
- 资助金额:
$ 32.06万 - 项目类别:
Standard Grant
SPX: Collaborative Research: Scalable Neural Network Paradigms to Address Variability in Emerging Device based Platforms for Large Scale Neuromorphic Computing
SPX:协作研究:可扩展神经网络范式,以解决基于新兴设备的大规模神经形态计算平台的可变性
- 批准号:
2401544 - 财政年份:2023
- 资助金额:
$ 32.06万 - 项目类别:
Standard Grant
SPX: Collaborative Research: Intelligent Communication Fabrics to Facilitate Extreme Scale Computing
SPX:协作研究:促进超大规模计算的智能通信结构
- 批准号:
2412182 - 财政年份:2023
- 资助金额:
$ 32.06万 - 项目类别:
Standard Grant
SPX: Collaborative Research: Cross-stack Memory Optimizations for Boosting I/O Performance of Deep Learning HPC Applications
SPX:协作研究:用于提升深度学习 HPC 应用程序 I/O 性能的跨堆栈内存优化
- 批准号:
2318628 - 财政年份:2022
- 资助金额:
$ 32.06万 - 项目类别:
Standard Grant
SPX: Collaborative Research: NG4S: A Next-generation Geo-distributed Scalable Stateful Stream Processing System
SPX:合作研究:NG4S:下一代地理分布式可扩展状态流处理系统
- 批准号:
2202859 - 财政年份:2022
- 资助金额:
$ 32.06万 - 项目类别:
Standard Grant
SPX: Collaborative Research: FASTLEAP: FPGA based compact Deep Learning Platform
SPX:协作研究:FASTLEAP:基于 FPGA 的紧凑型深度学习平台
- 批准号:
2333009 - 财政年份:2022
- 资助金额:
$ 32.06万 - 项目类别:
Standard Grant
SPX: Collaborative Research: Memory Fabric: Data Management for Large-scale Hybrid Memory Systems
SPX:协作研究:内存结构:大规模混合内存系统的数据管理
- 批准号:
2132049 - 财政年份:2021
- 资助金额:
$ 32.06万 - 项目类别:
Standard Grant
SPX: Collaborative Research: Automated Synthesis of Extreme-Scale Computing Systems Using Non-Volatile Memory
SPX:协作研究:使用非易失性存储器自动合成超大规模计算系统
- 批准号:
2113307 - 财政年份:2020
- 资助金额:
$ 32.06万 - 项目类别:
Standard Grant
SPX: Collaborative Research: FASTLEAP: FPGA based compact Deep Learning Platform
SPX:协作研究:FASTLEAP:基于 FPGA 的紧凑型深度学习平台
- 批准号:
1919117 - 财政年份:2019
- 资助金额:
$ 32.06万 - 项目类别:
Standard Grant
SPX: Collaborative Research: Intelligent Communication Fabrics to Facilitate Extreme Scale Computing
SPX:协作研究:促进超大规模计算的智能通信结构
- 批准号:
1918987 - 财政年份:2019
- 资助金额:
$ 32.06万 - 项目类别:
Standard Grant