权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: SHF: Medium: HERMES: On-Device Distributed Machine Learning via Model-Hardware Co-Design

协作研究：SHF：媒介：HERMES：通过模型硬件协同设计实现设备上分布式机器学习

基本信息

批准号：
2107024
负责人：
Gauri Joshi
金额：
$ 63.6万
依托单位：
Carnegie-Mellon University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-10-01 至 2024-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2107024&HistoricalAwards=false
关键词：
Collaborative Research SHF Medium HERMES

项目摘要

Machine Learning (ML) is poised to become the most disruptive technology in modern society by changing all aspects of how humans interact with each other or with the world around them. To be effective, ML models must use vast amounts of data and must be built and updated efficiently wherever and whenever new data, devices, or users are available. To satisfy consumer needs or stringent device or environmental constraints, ML systems must respond fast and use minimum energy whenever possible, especially in the context of widely spread Internet-of-Things (IoT) devices. This project addresses this need by developing new approaches for distributed training that allows for fast and energy efficient training in the field, directly on IoT devices. The results of this project are poised to directly impact a wide array of applications, ranging from human mobility tracking and prediction, to real-time speech or language processing. Furthermore, the project aims to change how engineers are trained in a multidisciplinary fashion for dealing with the problem of efficiently designing distributed ML systems that respond in real-time and with low energy cost to availability of data, devices, or users. The project aims to develop a body of diverse research trainees, while expanding outreach to high-school and middle-school student populations. Given the unified interdisciplinary aspects of this work, its workforce development plan, and its industrial impact, this project enables wide collaboration among emerging or established engineers and industrial partners.Most training of ML models is done centrally in the cloud, thereby not satisfying user privacy concerns or response times, and becoming inapplicable if fast model updates are needed. While efficient on-device inference has been an intense focus of recent research, on-device distributed training and inference have not been addressed from response time and energy efficiency perspectives; this is particularly important for IoT, where the network plays a major part both in training and inference efficiency. To address these challenges, this project (dubbed HERMES) provides a unified multipronged approach for meeting real-time and energy constraints in an on-device distributed setting. HERMES ensures that ML methods and underlying hardware are co-designed, thereby addressing current challenges of private data sharing, communication overhead, or real-time and energy-efficient response of distributed ML. More specifically, Hermes includes: (i) a set of scalable approaches for hardware-aware real-time, energy efficient distributed training based on federated learning and distributed optimization that is robust to data and device variability; (ii) the co-design of ML model and hardware, comprising hyperparameter optimization that exploits hardware characteristics and identifies constraint-satisfying ML models, and hardware design exploration that efficiently finds constraint satisfying architectures; and (iii) an analysis and prototyping infrastructure for demonstrating the benefits of resulting ML systems.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

机器学习（ML）有望成为现代社会中最具颠覆性的技术，它改变了人类与他人或与周围世界互动的方方面面。为了有效，机器学习模型必须使用大量数据，并且无论何时何地有新的数据、设备或用户可用，都必须高效地构建和更新。为了满足消费者需求或严格的设备或环境限制，机器学习系统必须尽可能快速响应并使用最少的能量，特别是在广泛传播的物联网（IoT）设备的背景下。该项目通过开发分布式培训的新方法来满足这一需求，该方法允许直接在物联网设备上进行快速和节能的现场培训。该项目的成果将直接影响一系列广泛的应用，从人类移动跟踪和预测到实时语音或语言处理。此外，该项目旨在改变工程师以多学科方式进行培训的方式，以处理有效设计分布式机器学习系统的问题，这些系统可以实时响应数据、设备或用户的可用性，并且能耗低。该项目旨在培养一批多样化的研究学员，同时扩大对高中和初中学生的接触。考虑到这项工作的统一的跨学科方面，它的劳动力发展计划，以及它的工业影响，这个项目使新兴或成熟的工程师和工业伙伴之间的广泛合作成为可能。大多数ML模型的训练集中在云中完成，因此不能满足用户隐私问题或响应时间，并且在需要快速模型更新时变得不适用。虽然高效的设备上推理一直是最近研究的焦点，但从响应时间和能源效率的角度来看，设备上的分布式训练和推理尚未得到解决；这对于物联网来说尤其重要，因为网络在训练和推理效率方面都起着重要作用。为了应对这些挑战，该项目（被称为HERMES）提供了一种统一的多管齐下的方法，以满足设备上分布式设置的实时和能源限制。HERMES确保机器学习方法和底层硬件共同设计，从而解决当前私有数据共享、通信开销或分布式机器学习的实时和节能响应方面的挑战。更具体地说，HERMES包括：(i)一套可扩展的方法，用于硬件感知的实时、节能的分布式训练，基于联邦学习和分布式优化，对数据和设备可变性具有鲁棒性；（ii）机器学习模型和硬件的协同设计，包括利用硬件特性识别满足约束的机器学习模型的超参数优化，以及有效找到满足约束的架构的硬件设计探索；（iii）分析和原型基础设施，以展示所产生的机器学习系统的好处。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（7）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Federated Hyperparameter Tuning: Challenges, Baselines, and Connections to Weight-Sharing

DOI：
发表时间：
2021-06
期刊：
影响因子：
0
作者：
M. Khodak;Renbo Tu;Tian Li;Liam Li;Maria-Florina Balcan;Virginia Smith;Ameet Talwalkar
通讯作者：
M. Khodak;Renbo Tu;Tian Li;Liam Li;Maria-Florina Balcan;Virginia Smith;Ameet Talwalkar

Heterogeneous Ensemble Knowledge Transfer for Training Large Models in Federated Learning

DOI：
10.48550/arxiv.2204.12703
发表时间：
2022-04
期刊：
影响因子：
0
作者：
Yae Jee Cho;Andre Manoel;Gauri Joshi;Robert Sim;D. Dimitriadis
通讯作者：
Yae Jee Cho;Andre Manoel;Gauri Joshi;Robert Sim;D. Dimitriadis

Federated Learning under Distributed Concept Drift

DOI：
10.48550/arxiv.2206.00799
发表时间：
2022-06
期刊：
ArXiv
影响因子：
0
作者：
Ellango Jothimurugesan;Kevin Hsieh;Jianyu Wang;Gauri Joshi;Phillip B. Gibbons
通讯作者：
Ellango Jothimurugesan;Kevin Hsieh;Jianyu Wang;Gauri Joshi;Phillip B. Gibbons

Communication-Efficient and Model-Heterogeneous Personalized Federated Learning via Clustered Knowledge Transfer