权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Recursive Distributed Matrix and Tensor Decompositions on Neural Engines

职业：神经引擎上的递归分布式矩阵和张量分解

基本信息

批准号：
2146509
负责人：
Panruo Wu
金额：
$ 52.87万
依托单位：
University of Houston
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-03-01 至 2027-02-28
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2146509&HistoricalAwards=false
关键词：
CAREER Recursive Distributed Matrix Tensor

项目摘要

Matrix and tensor decompositions are one of the most important building blocks for scientific computing and are increasingly important in data-centric computing and machine-learning models. The lack of software and algorithms that can efficiently deal with large data sets and exploit the ubiquitous availability of neural engines is holding back progress. Legacy distributed matrix packages based on complex data distribution schemes not only add friction in adoption in new areas but also impede the exploration of cutting-edge algorithms at scale. New exciting algorithms such as randomized linear algebra, structured matrix computation, and advanced eigen decompositions that are synergistic to neural engines remain unexplored, ad-hoc, or hard to use by non-experts in numerical analysis. New powerful architectures -- neural engines -- promise orders of magnitudes o performance and energy benefits but remain a challenge to use outside of neural networks. This proposal aims to create a unified software system to achieve high-performance, scalable, distributed matrix and tensor decompositions on neural engines through concerted research and development.This project addresses three research thrusts to achieve its goals. A) In contrast to conventional arithmetic-centric algorithm design, this research focuses on communication-efficient algorithm variants. A central challenge in realizing the proposed goals is the avoidance, and management, of data movement. Computation speed has become amazingly fast on neural engines, while data movement latency and bandwidth lag far behind and the gap is widening. B) Incorporation of neural engines to state-of-the-art numerical algorithms. Recent numerical analysis has seen some exciting developments in randomized algorithms, low-precision direct decomposition as a preconditioner, and novel polar decomposition-based spectral divide-and-conquer methods for eigensystems. These new developments are not only exciting by themselves, but they have the potential to exploit neural engines especially well and blend with communication-centric algorithms naturally. C) Exploration of Universal Distributed Array (UDA), a new data structure based on a multi-dimensional cyclic data distribution scheme, to achieve load balancing, scalability, and unified support for all matrix and tensor decompositions. This proposal extends the cyclic data-distribution scheme to support communication-efficient algorithms including recursive algorithms due to flexible alignment, and to multi-dimensional to support tensor decomposition. The project will develop efficient, scalable, and easy-to-use communication and computational primitives on distributed neural engines and will include the most useful matrix/tensor decomposition algorithms as a composable and extensible library.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

矩阵和张量分解是科学计算最重要的构件之一，在以数据为中心的计算和机器学习模型中越来越重要。缺乏能够有效处理大数据集和利用无处不在的神经引擎的软件和算法，阻碍了这一进展。基于复杂数据分发方案的传统分布式矩阵包不仅在新领域的采用中增加了摩擦，而且阻碍了对大规模尖端算法的探索。新的激动人心的算法，如随机化线性代数、结构化矩阵计算和高级特征分解，与神经引擎协同工作，仍未被探索、特别或难以被非专家用于数值分析。新的功能强大的架构--神经引擎--承诺了数量级的性能和能源效益，但在神经网络之外使用仍然是一个挑战。该方案旨在创建一个统一的软件系统，通过协同研究和开发来实现神经引擎上的高性能、可扩展、分布式矩阵和张量分解。A)与传统的以算法为中心的算法设计不同，本研究的重点是通信效率高的算法变体。实现拟议目标的一个核心挑战是避免和管理数据移动。神经引擎的计算速度已经变得惊人地快，而数据移动延迟和带宽远远落后，差距还在扩大。B)将神经引擎纳入最先进的数值算法。最近的数值分析在随机化算法、作为预条件的低精度直接分解以及基于极分解的特征系统谱分治方法等方面取得了一些令人振奋的进展。这些新的发展不仅本身令人兴奋，而且有可能很好地利用神经引擎，并自然地与以通信为中心的算法融合在一起。C)探索通用分布式阵列(UDA)，这是一种基于多维循环数据分发方案的新数据结构，以实现负载均衡、可伸缩性，并统一支持所有矩阵和张量分解。该方案扩展了循环数据分发方案以支持通信高效的算法，包括由于灵活对齐而导致的递归算法，并扩展到多维以支持张量分解。该项目将在分布式神经引擎上开发高效、可扩展和易于使用的通信和计算原语，并将包括最有用的矩阵/张量分解算法作为可组合和可扩展的库。该奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（1）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Fast Symmetric Eigenvalue Decomposition via WY Representation on Tensor Core

通过张量核心上的 WY 表示进行快速对称特征值分解

DOI：
10.1145/3572848.3577516
发表时间：
2023
期刊：
ACM
影响因子：
0
作者：
Zhang, Shaoshuai;Shah, Ruchi;Ootomo, Hiroyuki;Yokota, Rio;Wu, Panruo
通讯作者：
Wu, Panruo

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Panruo Wu其他文献

Extending checksum-based ABFT to tolerate soft errors online in iterative methods

扩展基于校验和的 ABFT 以容忍迭代方法中的在线软错误

DOI：
发表时间：
2014
期刊：
International Conference on Parallel and Distributed Systems
影响因子：
0
作者：
Longxiang Chen;Dingwen Tao;Panruo Wu;Zizhong Chen
通讯作者：
Zizhong Chen

Investigating half precision arithmetic to accelerate dense linear system solvers

研究半精度算法以加速密集线性系统求解器

DOI：
10.1145/3148226.3148237
发表时间：
2017
期刊：
Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems
影响因子：
0
作者：
A. Haidar;Panruo Wu;S. Tomov;J. Dongarra
通讯作者：
J. Dongarra

High Accuracy Matrix Computations on Neural Engines: A Study of QR Factorization and its Applications

神经引擎上的高精度矩阵计算：QR 分解及其应用的研究

DOI：
发表时间：
2020
期刊：
IEEE International Symposium on High-Performance Parallel Distributed Computing
影响因子：
0
作者：
Shaoshuai Zhang;Elaheh Baharlouei;Panruo Wu
通讯作者：
Panruo Wu

Silent Data Corruption Resilient Two-sided Matrix Factorizations

静默数据损坏弹性双边矩阵分解

DOI：
发表时间：
2017
期刊：
ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming
影响因子：
0
作者：
Panruo Wu;Nathan Debardeleben;Qiang Guan;S. Blanchard;Jieyang Chen;Dingwen Tao;Xin Liang;Kaiming Ouyang;Zizhong Chen
通讯作者：
Zizhong Chen