Performance, Portability, and Productivity for Deep Learning Applications on Multi- and Many-Core Architectures (PPP-DL)
多核和众核架构上深度学习应用的性能、可移植性和生产力 (PPP-DL)
基本信息
- 批准号:470527619
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:德国
- 项目类别:Research Grants
- 财政年份:
- 资助国家:德国
- 起止时间:
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Deep Learning (DL) is currently the most popular machine-learning method that solves a great variety of real-world problems in academia and industry. The success of DL applications critically depends on the quality of software that implements DL algorithms for modern parallel architectures like multi-core CPU, Graphics Processing Unit (GPU), Field-Programmable Gate Array (FPGA), etc. The state-of-the-art DL frameworks like TensorFlow and PyTorch rely for high performance upon general-purpose libraries provided by vendors, such as Intel or NVIDIA, causing major weaknesses regarding three fundamental aspects: i) suboptimal performance – many DL-specific optimizations are not applicable because of libraries’ focus toward general-purpose usage; ii) lacking both functional and performance portability, because the libraries are specifically designed and optimized toward architectures of particular vendors only; iii) restricted user productivity, because the libraries are limited to a fixed set of pre- implemented algorithms (e.g., matrix multiplication and convolutions), and it is cumbersome to integrate high-performance libraries into DL frameworks. This project will develop a novel, holistic approach toward automatic code generation and optimization for DL applications targeting modern parallel architectures; its overall goal is to address in one combined approach three major research challenges in the area of high-performance computing for DL: Performance, Portability, and Productivity (PPP). We plan to achieve the goal of the project based on the following new contributions: 1) a new algebraic formalism and a formalism-based Domain-Specific Language (DSL) for conveniently expressing/implementing established and emerging DL applications at a high-level of abstraction, thereby contributing to programmer’s productivity; 2) a uniform low-level programming model for DL applications, which enables functional portability of code by being straightforwardly lowerable to executable code in the state-of-practice parallel programming approaches: OpenMP, CUDA, OpenCL, etc.; 3) a code generation mechanism for our DSL that enables high, portable performance over various architectures and input/output characteristics by automatically generating auto-tunable code in our low-level programming model; 4) a systematic process that integrates our code generation mechanism into modern DL frameworks, based on the emerging MLIR framework; 5) a new auto-tuning system that fully automatically optimizes our generated code via combined numerical search techniques; 6) a new analytical cost model to predict for different architectures the run time of DL applications expressed in our DSL, in order to accelerate the auto-tuning process.We will experimentally compare our approach in terms of all – performance, portability, and productivity – to state-of-the-art approaches for a broad range of DL applications, parallel architectures, and real-world DL data sets.
深度学习(DL)是目前最流行的机器学习方法,可以解决学术界和工业界的各种现实问题。DL应用的成功关键取决于为现代并行架构(如多核CPU、图形处理单元(GPU)、现场可编程门阵列(FPGA)等)实现DL算法的软件质量。最先进的DL框架(如TensorFlow和PyTorch)依赖于英特尔或NVIDIA等供应商提供的通用库来实现高性能,导致关于三个基本方面的主要弱点:i)次优性能-许多DL特定的优化不适用,因为库关注于通用用途; ii)缺乏功能和性能可移植性,因为库仅针对特定供应商的体系结构专门设计和优化; iii)受限的用户生产力,因为库限于预先实现的算法的固定集合(例如,矩阵乘法和卷积),并且将高性能库集成到DL框架中很麻烦。该项目将开发一种新颖的,全面的方法,以自动代码生成和优化DL应用程序,目标是现代并行架构;其总体目标是在一个综合的方法来解决DL高性能计算领域的三个主要研究挑战:性能,可移植性和生产力(PPP)。我们计划基于以下新的贡献来实现该项目的目标:1)一个新的代数形式主义和基于形式主义的领域特定语言(DSL),用于在高抽象级别上方便地表达/实现已建立和新兴的DL应用,从而有助于程序员的生产力; 2)用于DL应用程序的统一低级编程模型,其通过在实践状态并行编程方法中直接降低到可执行代码来实现代码的功能可移植性:OpenMP、CUDA、OpenCL等; 3)一种DSL的代码生成机制,通过在我们的低级编程模型中自动生成可自动调整的代码,在各种体系结构和输入/输出特性上实现高的可移植性能; 4)一种基于新兴的MLIR框架,将我们的代码生成机制集成到现代DL框架中的系统过程; 5)一个新的自动调优系统,通过组合数值搜索技术完全自动优化我们生成的代码; 6)一个新的分析成本模型,用于预测不同体系结构的DL应用程序在我们的DSL中的运行时间,为了加速自动调整过程。我们将实验比较我们的方法在所有方面-性能,可移植性和生产力-国家的最先进的方法,广泛的DL应用程序,并行架构,和现实世界的DL数据集。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Professor Dr. Sergei Gorlatch其他文献
Professor Dr. Sergei Gorlatch的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Professor Dr. Sergei Gorlatch', 18)}}的其他基金
Collective Operations: Formal Framework, Equalities, Efficiency
集体运作:正式框架、平等、效率
- 批准号:
5264766 - 财政年份:2000
- 资助金额:
-- - 项目类别:
Research Grants
相似海外基金
A method to improve capture of causal genetics and by extension, cross-population portability when constructing polygenic scores
一种在构建多基因评分时改善因果遗传学捕获以及扩展的跨群体可移植性的方法
- 批准号:
10679656 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Collaborative Research: EAGER: Real-time Strategies and Synchronized Time Distribution Mechanisms for Enhanced Exascale Performance-Portability and Predictability
合作研究:EAGER:实时策略和同步时间分配机制,以增强百亿亿次性能-可移植性和可预测性
- 批准号:
2405142 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Standard Grant
Social applications of personal acitivity log data acquired with data portability right
通过数据可移植权获取的个人活动日志数据的社交应用
- 批准号:
23H01528 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (B)
CAREER: GPU Performance Portability for Volunteer Computing through Heterogeneity-aware Autotuning
职业:通过异构感知自动调整实现志愿计算的 GPU 性能可移植性
- 批准号:
2144384 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Continuing Grant
Collaborative Research: EAGER: Real-time Strategies and Synchronized Time Distribution Mechanisms for Enhanced Exascale Performance-Portability and Predictability
合作研究:EAGER:实时策略和同步时间分配机制,以增强百亿亿次性能-可移植性和可预测性
- 批准号:
2151021 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Standard Grant
Collaborative Research: EAGER: Real-time Strategies and Synchronized Time Distribution Mechanisms for Enhanced Exascale Performance-Portability and Predictability
合作研究:EAGER:实时策略和同步时间分配机制,以增强百亿亿次性能-可移植性和可预测性
- 批准号:
2151022 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Standard Grant
Collaborative Research: EAGER: Real-time Strategies and Synchronized Time Distribution Mechanisms for Enhanced Exascale Performance-Portability and Predictability
合作研究:EAGER:实时策略和同步时间分配机制,以增强百亿亿次性能-可移植性和可预测性
- 批准号:
2151020 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Standard Grant
Automating Matrix Code Optimization for Performance and Portability
自动优化矩阵代码以提高性能和可移植性
- 批准号:
RGPIN-2019-06516 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual
VENTETE - Novel collapsible cycle helmet with multi-density structural system, aiming to prevent traumatic brain injury and increase portability
VENTETE - 新型可折叠自行车头盔,具有多密度结构系统,旨在防止创伤性脑损伤并增加便携性
- 批准号:
10034056 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Collaborative R&D
Automating Matrix Code Optimization for Performance and Portability
自动优化矩阵代码以提高性能和可移植性
- 批准号:
RGPIN-2019-06516 - 财政年份:2021
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual