权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

XPS: FULL: FP: Collaborative Research: Taming parallelism: optimally exploiting high-throughput parallel architectures

XPS：完整：FP：协作研究：驯服并行性：最佳地利用高吞吐量并行架构

基本信息

批准号：
1439062
负责人：
Kunal Agrawal
金额：
$ 33.03万
依托单位：
Washington University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2014
资助国家：
美国
起止时间：
2014-09-01 至 2020-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1439062&HistoricalAwards=false
关键词：
XPS FULL FP Collaborative Research

项目摘要

Title: XPS: FULL: FP: Collaborative Research: Taming parallelism: Optimally exploiting high-throughput parallel architecturesOver the past decade, computer manufacturers have focused on producing "multicore" chips, that package multiple, powerful computing cores on a single chip. Researchers have invested significant effort in developing methods for writing programs that can run efficiently on these cores. The basic idea is to allow programmers to write programs using a high-level programming model and to rely on an underlying compiler and runtime system to efficiently schedule these programs on multicore platforms. However, due to power and heat dissipation concerns, emerging "throughput-oriented" computing systems increasingly rely on far simpler computing cores to deliver parallel computing performance. These cores are much more efficient than traditional multicores, and can deliver much higher performance. Practitioners across numerous fields -- bioinformatics, data analytics, machine learning, etc. -- are deploying these systems to harness their power. Unfortunately, existing high level programming models are targeted to multicore chips, and do not produce code that can run effectively on these new systems. As a result, practitioners are forced to rewrite their applications, with painstaking low-level optimization and scheduling. This project will develop schemes to adapt applications written for multicore systems to run efficiently on throughput-oriented processors. The intellectual merits are novel program optimizations that will transform multicore-oriented programs into forms that map efficiently to throughput-oriented processors, scheduling mechanisms that ensure that these throughput-oriented processors do not waste computational resources, and scheduling policies that ensure that the mechanisms are used effectively. The project's broader significance and importance are that programmers will be able to write portable, high-performant and energy-efficient programs for both traditional multicore systems as well as throughput-oriented systems. Moreover, high-level programming models will be used to program the throughput-oriented machines, thus leading to significant reduction of programming effort for practitioners in many science and engineering disciplines. Finally, outreach efforts enhance the project by providing training and mentoring to a diverse group of students.Languages like Cilk provide support for "dynamic multithreading", which allows programmers to identify all of the parallelism in their program, while relying on sophisticated runtime systems to map that parallelism to available parallel execution hardware at runtime. However, Cilk-style execution is inappropriate for the vector-based parallelism found in SIMD units, GPUs and the Xeon Phi; vector parallelism requires finding identical computations performed on different data units. This project investigates a series of transformations that will morph Cilk-style programs into programs that expose vectorizable parallelism, allowing dynamic multithreading programs to be mapped to emerging throughput-oriented architectures. The enabling transformation involves transforming task parallel applications into data-parallel applications by identifying similar tasks being performed at different points in the computation. This project develops a series of scheduling mechanisms and provably efficient scheduling policies that ensure that parallelizing dynamic multithreading applications on throughput-oriented architectures are effective. In this manner, this project enables portable applications that run efficiently both on multicores and on vector-based architectures.

职务名称：光电子能谱：满：FP：合作研究：驯服并行：最佳地利用高吞吐量并行架构在过去的十年中，计算机制造商一直专注于生产“多核”芯片，即在单个芯片上封装多个强大的计算核心。研究人员已经投入了大量的精力来开发编写可以在这些核心上有效运行的程序的方法。其基本思想是允许程序员使用高级编程模型编写程序，并依赖底层编译器和运行时系统在多核平台上有效地调度这些程序。然而，由于功率和散热问题，新兴的“面向吞吐量”的计算系统越来越依赖于简单得多的计算核心来提供并行计算性能。这些内核比传统的多核更高效，并且可以提供更高的性能。许多领域的从业者-生物信息学，数据分析，机器学习等-正在部署这些系统来利用它们的力量。不幸的是，现有的高级编程模型是针对多核芯片的，并且不能产生可以在这些新系统上有效运行的代码。因此，从业者被迫重写他们的应用程序，进行艰苦的底层优化和调度。该项目将开发方案，使为多核系统编写的应用程序能够在面向吞吐量的处理器上高效运行。智能的优点是新的程序优化，将多核导向的程序转换成有效地映射到面向吞吐量的处理器，调度机制，确保这些面向吞吐量的处理器不浪费计算资源，调度策略，确保机制被有效地使用的形式。该项目更广泛的意义和重要性在于，程序员将能够为传统的多核系统以及面向吞吐量的系统编写可移植的，高性能的和节能的程序。此外，高级编程模型将被用来编程的吞吐量为导向的机器，从而导致在许多科学和工程学科的从业者的编程工作显着减少。最后，拓展工作通过为不同的学生群体提供培训和指导来增强项目。像Cilk这样的语言提供了对“动态多线程”的支持，这允许程序员识别他们程序中的所有并行性，同时依赖于复杂的运行时系统来将该并行性映射到运行时可用的并行执行硬件。然而，Cilk风格的执行不适合SIMD单元、GPU和Xeon Phi中基于向量的并行性;向量并行性需要找到在不同数据单元上执行的相同计算。这个项目调查了一系列的转换，将变形Cilk风格的程序到程序，暴露可向量化的并行性，允许动态多线程程序被映射到新兴的面向吞吐量的架构。启用转换涉及通过识别在计算中的不同点处执行的类似任务来将任务并行应用转换为数据并行应用。该项目开发了一系列的调度机制和可证明有效的调度策略，以确保面向吞吐量的体系结构上的并行化动态多线程应用程序是有效的。通过这种方式，该项目使便携式应用程序能够在多核和基于向量的架构上高效运行。

项目成果

期刊论文数量（8）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Responsive parallelism with futures and state

与 future 和 state 的响应式并行

DOI：
10.1145/3385412.3386013
发表时间：
2020
期刊：
Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation
影响因子：
0
作者：
Muller, Stefan K.;Singer, Kyle;Goldstein, Noah;Acar, Umut A.;Agrawal, Kunal;Lee, I-Ting Angelina
通讯作者：
Lee, I-Ting Angelina

AMCilk: A Framework for Multiprogrammed Parallel Workloads

AMCIlk：多程序并行工作负载框架

DOI：
发表时间：
2020
期刊：
& ANALYTICS
影响因子：
0
作者：
Wang, Zhe;Xu, Chen;Agrawal, Kunal;Li, Jing
通讯作者：
Li, Jing

Processor-Oblivious Record and Replay

DOI：
10.1145/3365659
发表时间：
2019-12
期刊：
ACM Transactions on Parallel Computing (TOPC)
影响因子：
0
作者：
R. Utterback;Kunal Agrawal;I. Lee;Milind Kulkarni
通讯作者：
R. Utterback;Kunal Agrawal;I. Lee;Milind Kulkarni

Extracting SIMD Parallelism from Recursive Task-Parallel Programs

DOI：
10.1145/3365663
发表时间：
2019-12
期刊：
ACM Transactions on Parallel Computing (TOPC)
影响因子：
0
作者：
Bin Ren;S. Balakrishna;Youngjoon Jo;S. Krishnamoorthy;Kunal Agrawal;Milind Kulkarni
通讯作者：
Bin Ren;S. Balakrishna;Youngjoon Jo;S. Krishnamoorthy;Kunal Agrawal;Milind Kulkarni

Priority Scheduling for Interactive Applications

交互式应用程序的优先级调度

DOI：
10.1145/3350755.3400236
发表时间：
2020
期刊：
Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures
影响因子：
0
作者：
Singer, Kyle;Goldstein, Noah;Muller, Stefan K.;Agrawal, Kunal;Lee, I-Ting Angelina;Acar, Umut A.
通讯作者：
Acar, Umut A.

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Kunal Agrawal其他文献

Brief Announcement: Green Paging and Parallel Paging

简短公告：绿色分页和并行分页

DOI：
发表时间：
2020
期刊：
影响因子：
0
作者：
Kunal Agrawal;William Kuszmaul;Michele Scquizzato
通讯作者：
Michele Scquizzato

Intractability Issues in Mixed-Criticality Scheduling

混合关键调度中的棘手问题

DOI：
10.4230/lipics.ecrts.2018.11
发表时间：
2018
期刊：
IEEE Transactions on Computers
影响因子：
3.7
作者：
Kunal Agrawal;Sanjoy Baruah
通讯作者：
Sanjoy Baruah

Distributed Load Balancing in the Face of Reappearance Dependencies

面对再现依赖的分布式负载均衡

DOI：
10.1145/3626183.3659968
发表时间：
2024
期刊：
Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures
影响因子：
0
作者：
Kunal Agrawal;William Kuszmaul;Zhe Wang;Jinhao Zhao
通讯作者：
Jinhao Zhao

Number : WUCSE-2013-25 2013 Parallel Real-Time Scheduling of DAGs

编号：WUCSE-2013-25 2013 DAG 并行实时调度

DOI：
发表时间：
2015
期刊：
影响因子：
0
作者：
Abusayeed Saifullah;D. Ferry;Jing Li;Kunal Agrawal;Chenyang Lu
通讯作者：
Chenyang Lu

Analysis of classic algorithms on GPUs

GPU上经典算法分析

DOI：
10.1109/hpcsim.2014.6903670
发表时间：
2014
期刊：
2014 International Conference on High Performance Computing & Simulation (HPCS)
影响因子：
0
作者：
Lin Ma;R. Chamberlain;Kunal Agrawal
通讯作者：
Kunal Agrawal