OAC Core: SMALL: DeepJIMU: Model-Parallelism Infrastructure for Large-scale Deep Learning by Gradient-Free Optimization
OAC 核心:小型:DeepJIMU:通过无梯度优化实现大规模深度学习的模型并行基础设施
基本信息
- 批准号:2007976
- 负责人:
- 金额:$ 49.86万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-10-01 至 2020-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
In recent years, the use of deep neural networks (DNNs) has been increasing to obtain useful insights for scientific explorations, business management, security, and healthcare. The constant improvement of DNN model performance has been accompanied by an increase in their complexity and size, which indicate a clear trend toward larger and deeper models. Such a trend is especially the case for numerous important application domains, such as remote sensing where super-high-resolution geospatial image processing is required. Such applications lead to a huge challenge for the training of very large models to fit on a single computing device (e.g., a graphics processing unit, GPU), and hence raises urgent demands for partitioning such models across multiple computing devices and parallelizing the training process (i.e., model parallelism). However, until now model parallelism for DNNs has been poorly explored and is very difficult due to the inherent bottleneck from the backpropagation algorithm, where the training of one layer closely depends on input from all the previous layers. To overcome these challenges, this project aims a radically new pathway toward model parallelism infrastructure for large-scale DNNs based on optimization methods that do not rely on backpropagation for training. This project plans to address the challenges of training very large and very deep neural network models that require huge amounts of high-dimensional data. The project will develop new optimization techniques and distributed DNN training software infrastructure to enable wider applications and deployment of model parallel deep learning training. The project includes educational and engagement activities that will greatly increase the community's understanding of distributed machine learning algorithms and systems. Those activities include teaching and training students and peers, providing graduate and undergraduate students with new courses, and research and internship opportunities, as well as broadening participation of underrepresented groups and students at local high schools.This project brings together researchers in machine learning algorithms, distributed computing systems, remote sensing, and spatial data science, to boost the performance and scalability of deep learning applications enhanced by model parallelism. Specifically, this project focuses on proposing and developing a suite of new model parallelism optimization algorithms and system infrastructure for training large-scale DNNs, especially for image processing of massive datasets for geospatial scientific research. To enable model parallelism in the training, new gradient-free optimization methods are proposed to break down the whole problem of DNN optimization into subproblems, which can then be solved separately in parallel (by many workers) with high efficiency. The products of this project include new theories and algorithms for model parallelism, along with an efficient gradient-free DNN training framework with new scheduling and work balancing techniques. Specifically, this project has the following research thrusts: 1) Develop new gradient-free methods for training various types of DNNs; 2) Designing an algorithmic and theoretical framework of model parallelization based on gradient-free optimization; and 3) Building a scalable and efficient distributed training framework for a broad range of model parallel DNN training applications, such as deep learning for large graphs and very deep convolutional neural networks for image processing. This project also involves both theoretical and experimental comparison between the new techniques and current state-of-the-art methods, including those using gradient-based optimizations and pipeline parallelism.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
近年来,深度神经网络(dnn)的使用越来越多,以获得科学探索、商业管理、安全和医疗保健方面的有用见解。随着深度神经网络模型性能的不断提高,其复杂性和规模也在不断增加,这表明了模型向更大、更深方向发展的明显趋势。这种趋势在许多重要的应用领域尤其如此,例如需要超高分辨率地理空间图像处理的遥感。这样的应用导致了训练非常大的模型以适应单个计算设备(例如,图形处理单元,GPU)的巨大挑战,因此提出了跨多个计算设备划分这样的模型和并行化训练过程的迫切需求(即,模型并行化)。然而,到目前为止,由于反向传播算法的固有瓶颈,dnn的模型并行性尚未得到很好的探索,并且非常困难,其中一层的训练密切依赖于所有前一层的输入。为了克服这些挑战,该项目旨在为基于不依赖反向传播进行训练的优化方法的大规模dnn提供模型并行基础设施的全新途径。该项目计划解决训练需要大量高维数据的非常大和非常深的神经网络模型的挑战。该项目将开发新的优化技术和分布式DNN训练软件基础设施,以实现模型并行深度学习训练的更广泛应用和部署。该项目包括教育和参与活动,将大大增加社区对分布式机器学习算法和系统的理解。这些活动包括教学和培训学生和同龄人,为研究生和本科生提供新课程,以及研究和实习机会,以及扩大代表性不足的群体和学生在当地高中的参与。该项目汇集了机器学习算法、分布式计算系统、遥感和空间数据科学方面的研究人员,通过模型并行性增强深度学习应用程序的性能和可扩展性。具体而言,该项目侧重于提出和开发一套新的模型并行优化算法和系统基础设施,用于训练大规模深度神经网络,特别是用于地理空间科学研究的海量数据集的图像处理。为了在训练中实现模型的并行性,提出了一种新的无梯度优化方法,将DNN优化的整个问题分解成子问题,然后可以高效地(由许多工人)并行地单独求解。该项目的产品包括模型并行的新理论和算法,以及具有新调度和工作平衡技术的高效无梯度深度神经网络训练框架。具体而言,本项目的研究重点如下:1)开发新的无梯度方法来训练各种类型的dnn;2)设计了基于无梯度优化的模型并行化算法和理论框架;3)为广泛的模型并行DNN训练应用构建可扩展和高效的分布式训练框架,例如用于大图的深度学习和用于图像处理的非常深的卷积神经网络。该项目还涉及新技术与当前最先进方法之间的理论和实验比较,包括使用基于梯度的优化和管道并行的方法。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(11)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
InfiniStore: Elastic Serverless Cloud Storage
InfiniStore:弹性无服务器云存储
- DOI:10.14778/3587136.3587139
- 发表时间:2023
- 期刊:
- 影响因子:2.5
- 作者:Zhang, Jingyuan;Wang, Ao;Ma, Xiaolong;Carver, Benjamin;Newman, Nicholas John;Anwar, Ali;Rupprecht, Lukas;Tarasov, Vasily;Skourtis, Dimitrios;Yan, Feng
- 通讯作者:Yan, Feng
Saliency-Augmented Memory Completion for Continual Learning. SIAM International Conference on Data Mining (SDM 2023) (Acceptance Rate: 27.4%), accepted.
显着性增强%20Memory%20Completion%20for%20Continual%20Learning。%20SIAM%20International%20Conference%20on%20Data%20Mining%20(SDM%202023)%20(Acceptance%20Rate:%2027.4%),%20已接受。
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Bai, Guangji Bai;Ling, Chen;Gao, Yuyang;Zhao, Liang
- 通讯作者:Zhao, Liang
Functional Connectivity Prediction With Deep Learning for Graph Transformation
- DOI:10.1109/tnnls.2022.3197337
- 发表时间:2022-08
- 期刊:
- 影响因子:10.4
- 作者:Negar Etemadyrad;Yuyang Gao;Qingzhe Li;Xiaojie Guo;F. Krueger;Qixiang Lin;D. Qiu;Liang Zhao
- 通讯作者:Negar Etemadyrad;Yuyang Gao;Qingzhe Li;Xiaojie Guo;F. Krueger;Qixiang Lin;D. Qiu;Liang Zhao
Toward Quantized Model Parallelism for Graph-Augmented MLPs Based on Gradient-Free ADMM Framework
- DOI:10.1109/tnnls.2022.3223879
- 发表时间:2022-12
- 期刊:
- 影响因子:10.4
- 作者:Junxiang Wang;Hongyi Li;Zheng Chai;Yongchao Wang;Yue Cheng;Liang Zhao
- 通讯作者:Junxiang Wang;Hongyi Li;Zheng Chai;Yongchao Wang;Yue Cheng;Liang Zhao
Metagraph Aggregated Heterogeneous Graph Neural Network for Illicit Traded Product Identification in Underground Market
- DOI:10.1109/icdm50108.2020.00022
- 发表时间:2020-11
- 期刊:
- 影响因子:0
- 作者:Yujie Fan;Yanfang Ye;Qian Peng;Jianfei Zhang;Yiming Zhang-;Xusheng Xiao;C. Shi;Qi Xiong;Fudong Sh
- 通讯作者:Yujie Fan;Yanfang Ye;Qian Peng;Jianfei Zhang;Yiming Zhang-;Xusheng Xiao;C. Shi;Qi Xiong;Fudong Sh
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Liang Zhao其他文献
Novel imidazolium stationary phase for high-performance liquid chromatography.
用于高效液相色谱的新型咪唑固定相。
- DOI:
10.1016/j.chroma.2006.03.016 - 发表时间:
2006-05 - 期刊:
- 影响因子:0
- 作者:
Hongdeng Qiu;Shengxiang Jiang;Xia Liu*;Liang Zhao - 通讯作者:
Liang Zhao
Large-Scale Text Classification Using Scope-Based Convolutional Neural Network: A Deep Learning Approach
使用基于范围的卷积神经网络进行大规模文本分类:一种深度学习方法
- DOI:
10.1109/access.2019.2955924 - 发表时间:
2019 - 期刊:
- 影响因子:3.9
- 作者:
Jiaying Wang;Yaxin Li;Jing Shan;Jinling Bao;Chuanyu Zong;Liang Zhao - 通讯作者:
Liang Zhao
Phenotypic effects of the nurse Thylacospermum caespitosum on dependent plant species along regional climate stress gradients
沿区域气候胁迫梯度,袋囊草保育员对依赖植物物种的表型影响
- DOI:
10.1111/oik.04512 - 发表时间:
2018-02 - 期刊:
- 影响因子:3.4
- 作者:
Xingpei Jiang;Richard Michalet;Shuyan Chen;Liang Zhao;Xiangtai Wang;Chenyue Wang;Lizhe An;Sa Xiao - 通讯作者:
Sa Xiao
A visualized study of interfacial behavior of air–water two-phase flow in a rectangular Venturi channel
矩形文丘里通道中气水两相流界面行为的可视化研究
- DOI:
10.1016/j.taml.2018.05.004 - 发表时间:
2018-09 - 期刊:
- 影响因子:3.4
- 作者:
Jiang Huang;Licheng Sun;Min Du;Zhengyu Mo;Liang Zhao - 通讯作者:
Liang Zhao
Simultaneous Photo‐Induced Magnetic and Dielectric Switching in an Iron(II)‐Based Spin‐Crossover Hofmann‐Type Metal‐Organic Framework
铁 (II) 中的同步光感应磁和介电开关 – 基于自旋 – 交叉霍夫曼 – 类型金属 – 有机框架
- DOI:
10.1002/anie.202208208 - 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Nian-Tao Yao;Liang Zhao;Hui-Ying Sun;Cheng Yi;Ya-Hui Guan;Ya-Ming Li;Hiroki Oshio;Yin-Shan Meng;Tao Liu - 通讯作者:
Tao Liu
Liang Zhao的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Liang Zhao', 18)}}的其他基金
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
- 批准号:
2403312 - 财政年份:2024
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
CAREER: Uncovering Solar Wind Composition, Acceleration, and Origin through Observations, Modeling, and Machine Learning Methods
职业:通过观测、建模和机器学习方法揭示太阳风的成分、加速度和起源
- 批准号:
2237435 - 财政年份:2023
- 资助金额:
$ 49.86万 - 项目类别:
Continuing Grant
Travel: NSF Student Travel Support for the 2023 IEEE International Conference on Data Mining (IEEE ICDM 2023)
旅行:2023 年 IEEE 国际数据挖掘会议 (IEEE ICDM 2023) 的 NSF 学生旅行支持
- 批准号:
2324784 - 财政年份:2023
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
SHINE: Understanding the Physical Connection of the in-situ Properties and Coronal Origins of the Solar Wind with a Novel Artificial Intelligence Investigation
SHINE:通过新颖的人工智能研究了解太阳风的原位特性和日冕起源的物理联系
- 批准号:
2229138 - 财政年份:2022
- 资助金额:
$ 49.86万 - 项目类别:
Continuing Grant
III: Small: Graph Generative Deep Learning for Protein Structure Prediction
III:小:用于蛋白质结构预测的图生成深度学习
- 批准号:
2110926 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
CAREER: Spatial Network Deep Generative Modeling, Transformation, and Interpretation
职业:空间网络深度生成建模、转换和解释
- 批准号:
2113350 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Continuing Grant
CRII: III: Interpretable Models for Spatio-Temporal Event Forecasting using Social Sensors
CRII:III:使用社交传感器进行时空事件预测的可解释模型
- 批准号:
2103745 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
CAREER: Spatial Network Deep Generative Modeling, Transformation, and Interpretation
职业:空间网络深度生成建模、转换和解释
- 批准号:
1942594 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Continuing Grant
OAC Core: SMALL: DeepJIMU: Model-Parallelism Infrastructure for Large-scale Deep Learning by Gradient-Free Optimization
OAC 核心:小型:DeepJIMU:通过无梯度优化实现大规模深度学习的模型并行基础设施
- 批准号:
2106446 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
III: Small: Deep Generative Models for Temporal Graph Generation and Interpretation
III:小:用于时间图生成和解释的深度生成模型
- 批准号:
2007716 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
相似国自然基金
胆固醇羟化酶CH25H非酶活依赖性促进乙型肝炎病毒蛋白Core及Pre-core降解的分子机制研究
- 批准号:82371765
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
锕系元素5f-in-core的GTH赝势和基组的开发
- 批准号:22303037
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于合成致死策略搭建Core-matched前药共组装体克服肿瘤耐药的机制研究
- 批准号:
- 批准年份:2022
- 资助金额:52 万元
- 项目类别:
鼠伤寒沙门氏菌LPS core经由CD209/SphK1促进树突状细胞迁移加重炎症性肠病的机制研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于外泌体精准调控的“核-壳”(core-shell)同步血管化骨组织工程策略的应用与机制探讨
- 批准号:
- 批准年份:2020
- 资助金额:55 万元
- 项目类别:
肌营养不良蛋白聚糖Core M3型甘露糖肽的精确制备及功能探索
- 批准号:92053110
- 批准年份:2020
- 资助金额:70.0 万元
- 项目类别:重大研究计划
Core-1-O型聚糖黏蛋白缺陷诱导胃炎发生并介导慢性胃炎向胃癌转化的分子机制研究
- 批准号:81902805
- 批准年份:2019
- 资助金额:20.5 万元
- 项目类别:青年科学基金项目
原始地球增生晚期的Core-merging大碰撞事件:地核增生、核幔平衡与核幔边界结构的新认识
- 批准号:41973063
- 批准年份:2019
- 资助金额:65.0 万元
- 项目类别:面上项目
CORDEX-CORE区域气候模拟与预估研讨会
- 批准号:41981240365
- 批准年份:2019
- 资助金额:1.5 万元
- 项目类别:国际(地区)合作与交流项目
RBM38通过协助Pol-ε结合、招募core调控HBV复制
- 批准号:31900138
- 批准年份:2019
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Collaborative Research: OAC Core: Small: Anomaly Detection and Performance Optimization for End-to-End Data Transfers at Scale
协作研究:OAC 核心:小型:大规模端到端数据传输的异常检测和性能优化
- 批准号:
2412329 - 财政年份:2023
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
OAC Core: SHF: SMALL: ICURE -- In-situ Analytics with Compressed or Summary Representations for Extreme-Scale Architectures
OAC 核心:SHF:SMALL:ICURE——针对超大规模架构的压缩或摘要表示的原位分析
- 批准号:
2333899 - 财政年份:2023
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
OAC Core: SHF: SMALL: ICURE -- In-situ Analytics with Compressed or Summary Representations for Extreme-Scale Architectures
OAC 核心:SHF:SMALL:ICURE——针对超大规模架构的压缩或摘要表示的原位分析
- 批准号:
2007775 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
Collaborative Research: CNS core: OAC core: Small: New Techniques for I/O Behavior Modeling and Persistent Storage Device Configuration
合作研究: CNS 核心:OAC 核心:小型:I/O 行为建模和持久存储设备配置新技术
- 批准号:
2008324 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Small: Anomaly Detection and Performance Optimization for End-to-End Data Transfers at Scale
协作研究:OAC 核心:小型:大规模端到端数据传输的异常检测和性能优化
- 批准号:
2007789 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
Collaborative Research: CNS core: OAC core: Small: New Techniques for I/O Behavior Modeling and Persistent Storage Device Configuration
合作研究: CNS 核心:OAC 核心:小型:I/O 行为建模和持久存储设备配置新技术
- 批准号:
2008072 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Small: Efficient and Policy-driven Burst Buffer Sharing
合作研究:OAC Core:小型:高效且策略驱动的突发缓冲区共享
- 批准号:
2008388 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
OAC Core: Small: Collaborative Research: Conversational Agents for Supporting Sustainable Implementation and Systemic Diffusion of Cyberinfrastructure and Science Gateways
OAC 核心:小型:协作研究:支持网络基础设施和科学网关可持续实施和系统扩散的对话代理
- 批准号:
2007100 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
OAC Core: Small: Collaborative Research: Conversational Agents for Supporting Sustainable Implementation and Systemic Diffusion of Cyberinfrastructure and Science Gateways
OAC 核心:小型:协作研究:支持网络基础设施和科学网关可持续实施和系统扩散的对话代理
- 批准号:
2006816 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
OAC Core: Small: Open-Source Robust 4D Reconstruction Framework for Real-Time Dynamic Human Capture
OAC Core:小型:用于实时动态人体捕捉的开源稳健 4D 重建框架
- 批准号:
2007661 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant