OAC Core: SMALL: DeepJIMU: Model-Parallelism Infrastructure for Large-scale Deep Learning by Gradient-Free Optimization
OAC 核心:小型:DeepJIMU:通过无梯度优化实现大规模深度学习的模型并行基础设施
基本信息
- 批准号:2007976
- 负责人:
- 金额:$ 49.86万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-10-01 至 2020-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
In recent years, the use of deep neural networks (DNNs) has been increasing to obtain useful insights for scientific explorations, business management, security, and healthcare. The constant improvement of DNN model performance has been accompanied by an increase in their complexity and size, which indicate a clear trend toward larger and deeper models. Such a trend is especially the case for numerous important application domains, such as remote sensing where super-high-resolution geospatial image processing is required. Such applications lead to a huge challenge for the training of very large models to fit on a single computing device (e.g., a graphics processing unit, GPU), and hence raises urgent demands for partitioning such models across multiple computing devices and parallelizing the training process (i.e., model parallelism). However, until now model parallelism for DNNs has been poorly explored and is very difficult due to the inherent bottleneck from the backpropagation algorithm, where the training of one layer closely depends on input from all the previous layers. To overcome these challenges, this project aims a radically new pathway toward model parallelism infrastructure for large-scale DNNs based on optimization methods that do not rely on backpropagation for training. This project plans to address the challenges of training very large and very deep neural network models that require huge amounts of high-dimensional data. The project will develop new optimization techniques and distributed DNN training software infrastructure to enable wider applications and deployment of model parallel deep learning training. The project includes educational and engagement activities that will greatly increase the community's understanding of distributed machine learning algorithms and systems. Those activities include teaching and training students and peers, providing graduate and undergraduate students with new courses, and research and internship opportunities, as well as broadening participation of underrepresented groups and students at local high schools.This project brings together researchers in machine learning algorithms, distributed computing systems, remote sensing, and spatial data science, to boost the performance and scalability of deep learning applications enhanced by model parallelism. Specifically, this project focuses on proposing and developing a suite of new model parallelism optimization algorithms and system infrastructure for training large-scale DNNs, especially for image processing of massive datasets for geospatial scientific research. To enable model parallelism in the training, new gradient-free optimization methods are proposed to break down the whole problem of DNN optimization into subproblems, which can then be solved separately in parallel (by many workers) with high efficiency. The products of this project include new theories and algorithms for model parallelism, along with an efficient gradient-free DNN training framework with new scheduling and work balancing techniques. Specifically, this project has the following research thrusts: 1) Develop new gradient-free methods for training various types of DNNs; 2) Designing an algorithmic and theoretical framework of model parallelization based on gradient-free optimization; and 3) Building a scalable and efficient distributed training framework for a broad range of model parallel DNN training applications, such as deep learning for large graphs and very deep convolutional neural networks for image processing. This project also involves both theoretical and experimental comparison between the new techniques and current state-of-the-art methods, including those using gradient-based optimizations and pipeline parallelism.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
近年来,深度神经网络(DNN)的使用越来越多,以获得科学探索,商业管理,安全和医疗保健的有用见解。DNN模型性能的不断改进伴随着其复杂性和大小的增加,这表明了更大和更深的模型的明显趋势。这种趋势在许多重要的应用领域尤其如此,例如需要超高分辨率地理空间图像处理的遥感。这样的应用导致训练非常大的模型以适应单个计算设备(例如,图形处理单元GPU),并因此提出了在多个计算设备上划分这种模型和并行化训练过程的迫切需求(即,模型并行性)。然而,到目前为止,DNN的模型并行性还没有得到很好的探索,并且由于反向传播算法的固有瓶颈而非常困难,其中一层的训练密切依赖于所有先前层的输入。为了克服这些挑战,该项目旨在为大规模DNN的模型并行基础设施开辟一条全新的途径,该途径基于不依赖反向传播进行训练的优化方法。该项目计划解决训练需要大量高维数据的非常大和非常深的神经网络模型的挑战。该项目将开发新的优化技术和分布式DNN训练软件基础设施,以实现模型并行深度学习训练的更广泛应用和部署。该项目包括教育和参与活动,将大大提高社区对分布式机器学习算法和系统的理解。这些活动包括教学和培训学生和同龄人,为研究生和本科生提供新课程,研究和实习机会,以及扩大代表性不足的群体和当地高中学生的参与。该项目汇集了机器学习算法,分布式计算系统,遥感和空间数据科学的研究人员,提高通过模型并行增强的深度学习应用程序的性能和可扩展性。具体来说,该项目的重点是提出和开发一套新的模型并行优化算法和系统基础设施,用于训练大规模DNN,特别是用于地理空间科学研究的海量数据集的图像处理。为了在训练中实现模型并行性,提出了新的无梯度优化方法,将DNN优化的整个问题分解为子问题,然后可以(由许多工人)以高效率并行单独解决。该项目的产品包括模型并行的新理论和算法,沿着高效的无梯度DNN训练框架,以及新的调度和工作平衡技术。具体而言,本项目的研究方向如下:1)开发新的无梯度方法来训练各种类型的DNN; 2)设计基于无梯度优化的模型并行化算法和理论框架;以及3)为广泛的模型并行DNN训练应用构建可扩展且高效的分布式训练框架,例如用于大型图的深度学习和用于图像处理的非常深的卷积神经网络。该项目还涉及新技术与当前最先进方法之间的理论和实验比较,包括使用基于梯度的优化和管道并行的方法。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(11)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
InfiniStore: Elastic Serverless Cloud Storage
InfiniStore:弹性无服务器云存储
- DOI:10.14778/3587136.3587139
- 发表时间:2023
- 期刊:
- 影响因子:2.5
- 作者:Zhang, Jingyuan;Wang, Ao;Ma, Xiaolong;Carver, Benjamin;Newman, Nicholas John;Anwar, Ali;Rupprecht, Lukas;Tarasov, Vasily;Skourtis, Dimitrios;Yan, Feng
- 通讯作者:Yan, Feng
Saliency-Augmented Memory Completion for Continual Learning. SIAM International Conference on Data Mining (SDM 2023) (Acceptance Rate: 27.4%), accepted.
显着性增强%20Memory%20Completion%20for%20Continual%20Learning。%20SIAM%20International%20Conference%20on%20Data%20Mining%20(SDM%202023)%20(Acceptance%20Rate:%2027.4%),%20已接受。
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Bai, Guangji Bai;Ling, Chen;Gao, Yuyang;Zhao, Liang
- 通讯作者:Zhao, Liang
Functional Connectivity Prediction With Deep Learning for Graph Transformation
- DOI:10.1109/tnnls.2022.3197337
- 发表时间:2022-08
- 期刊:
- 影响因子:10.4
- 作者:Negar Etemadyrad;Yuyang Gao;Qingzhe Li;Xiaojie Guo;F. Krueger;Qixiang Lin;D. Qiu;Liang Zhao
- 通讯作者:Negar Etemadyrad;Yuyang Gao;Qingzhe Li;Xiaojie Guo;F. Krueger;Qixiang Lin;D. Qiu;Liang Zhao
Toward Quantized Model Parallelism for Graph-Augmented MLPs Based on Gradient-Free ADMM Framework
- DOI:10.1109/tnnls.2022.3223879
- 发表时间:2022-12
- 期刊:
- 影响因子:10.4
- 作者:Junxiang Wang;Hongyi Li;Zheng Chai;Yongchao Wang;Yue Cheng;Liang Zhao
- 通讯作者:Junxiang Wang;Hongyi Li;Zheng Chai;Yongchao Wang;Yue Cheng;Liang Zhao
Metagraph Aggregated Heterogeneous Graph Neural Network for Illicit Traded Product Identification in Underground Market
- DOI:10.1109/icdm50108.2020.00022
- 发表时间:2020-11
- 期刊:
- 影响因子:0
- 作者:Yujie Fan;Yanfang Ye;Qian Peng;Jianfei Zhang;Yiming Zhang-;Xusheng Xiao;C. Shi;Qi Xiong;Fudong Sh
- 通讯作者:Yujie Fan;Yanfang Ye;Qian Peng;Jianfei Zhang;Yiming Zhang-;Xusheng Xiao;C. Shi;Qi Xiong;Fudong Sh
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Liang Zhao其他文献
Novel imidazolium stationary phase for high-performance liquid chromatography.
用于高效液相色谱的新型咪唑固定相。
- DOI:
10.1016/j.chroma.2006.03.016 - 发表时间:
2006-05 - 期刊:
- 影响因子:0
- 作者:
Hongdeng Qiu;Shengxiang Jiang;Xia Liu*;Liang Zhao - 通讯作者:
Liang Zhao
The QUENDA-BOT: Autonomous Robot for Screw-Fixing Installation in Timber Building Construction
QUENDA-BOT:木结构建筑中用于螺钉固定安装的自主机器人
- DOI:
10.1109/case56687.2023.10260465 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Dinh Dang Khoa Le;Gibson Hu;Dikai Liu;R. Khonasty;Liang Zhao;Shoudong Huang;Pratik Shrestha;R. Belperio - 通讯作者:
R. Belperio
Phenotypic effects of the nurse Thylacospermum caespitosum on dependent plant species along regional climate stress gradients
沿区域气候胁迫梯度,袋囊草保育员对依赖植物物种的表型影响
- DOI:
10.1111/oik.04512 - 发表时间:
2018-02 - 期刊:
- 影响因子:3.4
- 作者:
Xingpei Jiang;Richard Michalet;Shuyan Chen;Liang Zhao;Xiangtai Wang;Chenyue Wang;Lizhe An;Sa Xiao - 通讯作者:
Sa Xiao
Interannual variability of dimethylsulfide in the Yellow Sea
黄海二甲硫醚的年际变化
- DOI:
10.1007/s00343-020-0480-0 - 发表时间:
2022-02 - 期刊:
- 影响因子:1.6
- 作者:
Sijia Wang;Qun Sun;Shuai Li;Jiawei Shen;Qian Liu;Liang Zhao - 通讯作者:
Liang Zhao
Tape-Assisted Photolithographic-Free Microfluidic Chip Cell Patterning for Tumor Metastasis Study
用于肿瘤转移研究的胶带辅助免光刻微流控芯片细胞图案化
- DOI:
10.1021/acs.analchem.7b03225 - 发表时间:
2017 - 期刊:
- 影响因子:7.4
- 作者:
Liang Zhao;Tengfei Guo;Lirong Wang;Yang Liu;Ganyu Chen;Hao Zhou;Meiqin Zhang - 通讯作者:
Meiqin Zhang
Liang Zhao的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Liang Zhao', 18)}}的其他基金
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
- 批准号:
2403312 - 财政年份:2024
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
CAREER: Uncovering Solar Wind Composition, Acceleration, and Origin through Observations, Modeling, and Machine Learning Methods
职业:通过观测、建模和机器学习方法揭示太阳风的成分、加速度和起源
- 批准号:
2237435 - 财政年份:2023
- 资助金额:
$ 49.86万 - 项目类别:
Continuing Grant
Travel: NSF Student Travel Support for the 2023 IEEE International Conference on Data Mining (IEEE ICDM 2023)
旅行:2023 年 IEEE 国际数据挖掘会议 (IEEE ICDM 2023) 的 NSF 学生旅行支持
- 批准号:
2324784 - 财政年份:2023
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
SHINE: Understanding the Physical Connection of the in-situ Properties and Coronal Origins of the Solar Wind with a Novel Artificial Intelligence Investigation
SHINE:通过新颖的人工智能研究了解太阳风的原位特性和日冕起源的物理联系
- 批准号:
2229138 - 财政年份:2022
- 资助金额:
$ 49.86万 - 项目类别:
Continuing Grant
III: Small: Graph Generative Deep Learning for Protein Structure Prediction
III:小:用于蛋白质结构预测的图生成深度学习
- 批准号:
2110926 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
CAREER: Spatial Network Deep Generative Modeling, Transformation, and Interpretation
职业:空间网络深度生成建模、转换和解释
- 批准号:
2113350 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Continuing Grant
CRII: III: Interpretable Models for Spatio-Temporal Event Forecasting using Social Sensors
CRII:III:使用社交传感器进行时空事件预测的可解释模型
- 批准号:
2103745 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
CAREER: Spatial Network Deep Generative Modeling, Transformation, and Interpretation
职业:空间网络深度生成建模、转换和解释
- 批准号:
1942594 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Continuing Grant
OAC Core: SMALL: DeepJIMU: Model-Parallelism Infrastructure for Large-scale Deep Learning by Gradient-Free Optimization
OAC 核心:小型:DeepJIMU:通过无梯度优化实现大规模深度学习的模型并行基础设施
- 批准号:
2106446 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
III: Small: Deep Generative Models for Temporal Graph Generation and Interpretation
III:小:用于时间图生成和解释的深度生成模型
- 批准号:
2007716 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
相似国自然基金
胆固醇羟化酶CH25H非酶活依赖性促进乙型肝炎病毒蛋白Core及Pre-core降解的分子机制研究
- 批准号:82371765
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
锕系元素5f-in-core的GTH赝势和基组的开发
- 批准号:22303037
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于合成致死策略搭建Core-matched前药共组装体克服肿瘤耐药的机制研究
- 批准号:
- 批准年份:2022
- 资助金额:52 万元
- 项目类别:
鼠伤寒沙门氏菌LPS core经由CD209/SphK1促进树突状细胞迁移加重炎症性肠病的机制研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于外泌体精准调控的“核-壳”(core-shell)同步血管化骨组织工程策略的应用与机制探讨
- 批准号:
- 批准年份:2020
- 资助金额:55 万元
- 项目类别:
肌营养不良蛋白聚糖Core M3型甘露糖肽的精确制备及功能探索
- 批准号:92053110
- 批准年份:2020
- 资助金额:70.0 万元
- 项目类别:重大研究计划
Core-1-O型聚糖黏蛋白缺陷诱导胃炎发生并介导慢性胃炎向胃癌转化的分子机制研究
- 批准号:81902805
- 批准年份:2019
- 资助金额:20.5 万元
- 项目类别:青年科学基金项目
原始地球增生晚期的Core-merging大碰撞事件:地核增生、核幔平衡与核幔边界结构的新认识
- 批准号:41973063
- 批准年份:2019
- 资助金额:65.0 万元
- 项目类别:面上项目
CORDEX-CORE区域气候模拟与预估研讨会
- 批准号:41981240365
- 批准年份:2019
- 资助金额:1.5 万元
- 项目类别:国际(地区)合作与交流项目
RBM38通过协助Pol-ε结合、招募core调控HBV复制
- 批准号:31900138
- 批准年份:2019
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Collaborative Research: OAC Core: Small: Anomaly Detection and Performance Optimization for End-to-End Data Transfers at Scale
协作研究:OAC 核心:小型:大规模端到端数据传输的异常检测和性能优化
- 批准号:
2412329 - 财政年份:2023
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
OAC Core: SHF: SMALL: ICURE -- In-situ Analytics with Compressed or Summary Representations for Extreme-Scale Architectures
OAC 核心:SHF:SMALL:ICURE——针对超大规模架构的压缩或摘要表示的原位分析
- 批准号:
2333899 - 财政年份:2023
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
OAC Core: SHF: SMALL: ICURE -- In-situ Analytics with Compressed or Summary Representations for Extreme-Scale Architectures
OAC 核心:SHF:SMALL:ICURE——针对超大规模架构的压缩或摘要表示的原位分析
- 批准号:
2007775 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
Collaborative Research: CNS core: OAC core: Small: New Techniques for I/O Behavior Modeling and Persistent Storage Device Configuration
合作研究: CNS 核心:OAC 核心:小型:I/O 行为建模和持久存储设备配置新技术
- 批准号:
2008324 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Small: Anomaly Detection and Performance Optimization for End-to-End Data Transfers at Scale
协作研究:OAC 核心:小型:大规模端到端数据传输的异常检测和性能优化
- 批准号:
2007789 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
Collaborative Research: CNS core: OAC core: Small: New Techniques for I/O Behavior Modeling and Persistent Storage Device Configuration
合作研究: CNS 核心:OAC 核心:小型:I/O 行为建模和持久存储设备配置新技术
- 批准号:
2008072 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Small: Efficient and Policy-driven Burst Buffer Sharing
合作研究:OAC Core:小型:高效且策略驱动的突发缓冲区共享
- 批准号:
2008388 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
OAC Core: Small: Collaborative Research: Conversational Agents for Supporting Sustainable Implementation and Systemic Diffusion of Cyberinfrastructure and Science Gateways
OAC 核心:小型:协作研究:支持网络基础设施和科学网关可持续实施和系统扩散的对话代理
- 批准号:
2007100 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
OAC Core: Small: Collaborative Research: Conversational Agents for Supporting Sustainable Implementation and Systemic Diffusion of Cyberinfrastructure and Science Gateways
OAC 核心:小型:协作研究:支持网络基础设施和科学网关可持续实施和系统扩散的对话代理
- 批准号:
2006816 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant
OAC Core: Small: Open-Source Robust 4D Reconstruction Framework for Real-Time Dynamic Human Capture
OAC Core:小型:用于实时动态人体捕捉的开源稳健 4D 重建框架
- 批准号:
2007661 - 财政年份:2020
- 资助金额:
$ 49.86万 - 项目类别:
Standard Grant














{{item.name}}会员




