CAREER: Dependable High Performance Scientific Computing at Extreme Scale via Algorithmic Fault Tolerance
职业:通过算法容错实现大规模可靠的高性能科学计算
基本信息
- 批准号:1305624
- 负责人:
- 金额:$ 45.45万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2012
- 资助国家:美国
- 起止时间:2012-09-01 至 2018-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Extreme scale high-end computing platforms are expected to be available before 2020 and will have 100 million to 1 billion CPU cores. Due to the large number of components in these platforms, the probability that errors occur during the execution of an extreme scale application is expected to be much higher than observed today. The goal of this CAREER research project is to develop highly efficient techniques to detect, locate, and correct both soft and hard errors according to the specific characteristics of an algorithm. The target algorithms include (1) Krylov subspace methods for solving sparse linear systems and eigenvalue problems; (2) Direct methods for solving dense linear systems and eigenvalue problems; and (3) Newton's method for solving systems of non-linear equations. This project will create significant education outcomes by integrating the following four components: (1) establishing a supercomputing research laboratory to support senior design projects and REU, enhance graduate education and research, and demonstrate highly dependable applications on high-end computing platforms; (2) enriching the teaching of both undergraduate and graduate courses by integrating fault tolerance and high performance computing into the courses; (3) increasing minority students involvement by encouraging minority students to pursue graduate degrees in computing; and (4) offering free workshops to K-12 teachers and students.
超大规模高端计算平台预计将在2020年之前问世,其CPU核数将达到1亿至10亿个。由于这些平台中有大量组件,因此在执行极端规模应用程序期间发生错误的概率预计将比目前观察到的要高得多。这个CAREER研究项目的目标是根据算法的具体特征开发高效的技术来检测、定位和纠正软错误和硬错误。目标算法包括:(1)求解稀疏线性系统和特征值问题的Krylov子空间方法;(2)求解密集线性系统和特征值问题的直接方法;(3)求解非线性方程组的牛顿方法。该项目将通过整合以下四个组成部分,创造显著的教育成果:(1)建立一个超级计算研究实验室,支持高级设计项目和REU,加强研究生教育和研究,并在高端计算平台上展示高度可靠的应用;(2)通过将容错和高性能计算融入到课程中,丰富本科和研究生课程的教学;(3)通过鼓励少数族裔学生攻读计算机研究生学位,提高少数族裔学生的参与度;(4)为K-12教师和学生提供免费讲习班。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Zizhong Chen其他文献
New-Sum: A Novel Online ABFT Scheme For General Iterative Methods
New-Sum:一种新颖的通用迭代方法在线 ABFT 方案
- DOI:
- 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
Dingwen Tao;S. Song;S. Krishnamoorthy;Panruo Wu;Xin Liang;E. Zhang;D. Kerbyson;Zizhong Chen - 通讯作者:
Zizhong Chen
Fault tolerant matrix-matrix multiplication: correcting soft errors on-line
容错矩阵-矩阵乘法:在线纠正软错误
- DOI:
- 发表时间:
2011 - 期刊:
- 影响因子:0
- 作者:
Panruo Wu;Chong Ding;Longxiang Chen;Feng Gao;T. Davies;Christer Karlsson;Zizhong Chen - 通讯作者:
Zizhong Chen
TiO2 particles wrapped onto macroporous germanium skeleton as high performance anode for lithium-ion batteries
包裹在大孔锗骨架上的TiO2颗粒作为锂离子电池的高性能负极
- DOI:
10.1016/j.cej.2019.122649 - 发表时间:
2020-02 - 期刊:
- 影响因子:15.1
- 作者:
Qiang Liu;Jiagang Hou;Caixia Xu;Zizhong Chen;Rong Qin;Hong Liu - 通讯作者:
Hong Liu
Improving performance of iterative methods by lossy checkponting
通过有损检查改善迭代方法的性能
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Dingwen Tao;S. Di;Xin Liang;Zizhong Chen;F. Cappello - 通讯作者:
F. Cappello
Improving Performance of Data Dumping with Lossy Compression for Scientific Simulation
通过有损压缩提高数据转储的性能以进行科学模拟
- DOI:
10.1109/cluster.2019.8891037 - 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Xin Liang;S. Di;Dingwen Tao;Sihuan Li;Bogdan Nicolae;Zizhong Chen;F. Cappello - 通讯作者:
F. Cappello
Zizhong Chen的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Zizhong Chen', 18)}}的其他基金
Student Travel Support for the 2016 International Conference on Networking, Architecture, and Storage (NAS'16)
2016 年网络、架构和存储国际会议 (NAS16) 的学生旅行支持
- 批准号:
1646797 - 财政年份:2016
- 资助金额:
$ 45.45万 - 项目类别:
Standard Grant
CSR: Small: Collaborative Research: EEDAG: Exploring Energy-Efficient Parallel Tasks Generation and Scheduling for Heterogeneous Multicore Systems
CSR:小型:协作研究:EEDAG:探索异构多核系统的节能并行任务生成和调度
- 批准号:
1304969 - 财政年份:2012
- 资助金额:
$ 45.45万 - 项目类别:
Standard Grant
SHF: Small: FTLA: Fault Tolerant Linear Algebra Software for Massively Parallel Architectures
SHF:小型:FTLA:大规模并行架构的容错线性代数软件
- 批准号:
1305622 - 财政年份:2012
- 资助金额:
$ 45.45万 - 项目类别:
Standard Grant
CAREER: Dependable High Performance Scientific Computing at Extreme Scale via Algorithmic Fault Tolerance
职业:通过算法容错实现大规模可靠的高性能科学计算
- 批准号:
1150273 - 财政年份:2012
- 资助金额:
$ 45.45万 - 项目类别:
Standard Grant
SHF: Small: FTLA: Fault Tolerant Linear Algebra Software for Massively Parallel Architectures
SHF:小型:FTLA:大规模并行架构的容错线性代数软件
- 批准号:
1118039 - 财政年份:2011
- 资助金额:
$ 45.45万 - 项目类别:
Standard Grant
CSR: Small: Collaborative Research: EEDAG: Exploring Energy-Efficient Parallel Tasks Generation and Scheduling for Heterogeneous Multicore Systems
CSR:小型:协作研究:EEDAG:探索异构多核系统的节能并行任务生成和调度
- 批准号:
1118037 - 财政年份:2011
- 资助金额:
$ 45.45万 - 项目类别:
Standard Grant
相似海外基金
Intelligent Dependable Environment Control For Sustainable Aquaculture
可持续水产养殖的智能可靠环境控制
- 批准号:
EP/Y000773/1 - 财政年份:2024
- 资助金额:
$ 45.45万 - 项目类别:
Research Grant
Development of model checking technology for dependable distributed systems
可靠分布式系统模型检测技术的开发
- 批准号:
23H03370 - 财政年份:2023
- 资助金额:
$ 45.45万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
CAREER: Dependable and Secure Machine Learning Acceleration from Untrusted Hardware
职业:来自不受信任的硬件的可靠且安全的机器学习加速
- 批准号:
2238873 - 财政年份:2023
- 资助金额:
$ 45.45万 - 项目类别:
Continuing Grant
CAREER: Dependable and Secure Machine Learning Acceleration from Untrusted Hardware
职业:来自不受信任的硬件的可靠且安全的机器学习加速
- 批准号:
2349538 - 财政年份:2023
- 资助金额:
$ 45.45万 - 项目类别:
Continuing Grant
Fault-Tolerant Energy Management for Highly Dependable Real-Time Embedded Systems
高度可靠的实时嵌入式系统的容错能源管理
- 批准号:
2302651 - 财政年份:2023
- 资助金额:
$ 45.45万 - 项目类别:
Standard Grant
High Dependable IoT System Platform by Verifying Synchronization of Distributed IoT Environment Supporting DX
通过验证支持 DX 的分布式物联网环境的同步性,构建高可靠的物联网系统平台
- 批准号:
23H03388 - 财政年份:2023
- 资助金额:
$ 45.45万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
The Power of Dependable Souls (PODS): A group-based, leisure-based community participation intervention for adults with serious mental illnesses
可靠灵魂的力量 (PODS):针对患有严重精神疾病的成年人的基于团体、休闲的社区参与干预措施
- 批准号:
22K17564 - 财政年份:2022
- 资助金额:
$ 45.45万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Dependable Predictive Inference with Uncertainty-Aware Machine Learning
通过不确定性感知机器学习进行可靠的预测推理
- 批准号:
2210637 - 财政年份:2022
- 资助金额:
$ 45.45万 - 项目类别:
Continuing Grant
Dependable Analysis of Network Application Data
网络应用数据的可靠分析
- 批准号:
RGPIN-2020-04696 - 财政年份:2022
- 资助金额:
$ 45.45万 - 项目类别:
Discovery Grants Program - Individual