权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: PPoSS: Planning: Integrated Scalable Platform for Privacy-aware Collaborative Learning and Inference

协作研究：PPoSS：规划：用于隐私意识协作学习和推理的集成可扩展平台

基本信息

批准号：
2028839
负责人：
Jimeng Sun
金额：
$ 5万
依托单位：
University of Illinois at Urbana-Champaign
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2020
资助国家：
美国
起止时间：
2020-10-01 至 2022-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2028839&HistoricalAwards=false
关键词：
Collaborative Research PPoSS Planning Integrated

项目摘要

Building scalable distributed heterogeneous systems of the future with easy-to-program software is broadly acknowledged to be a grand challenge. It is widely recognized that a major disruption is currently under way in the design of computer systems as processors strive to extend, and go beyond, the end-game of Moore’s Law. This disruption is manifest in new forms of heterogeneous and distributed processors and memories at all scales (on-chip, on-die, on-node, on-rack, on-cluster, and on-data-center), rendering scalability as a fundamental challenge at all levels. Healthcare analytics offers a unique opportunity to explore scalable system design for the 21st century because there has been a tectonic shift in the ability of medical institutions to capture and store medical data, and to even stream data in real time. This shift has already contributed to an ecosystem of Machine Learning (ML) models being trained for a variety of clinical tasks. A new distributed heterogeneous architecture is required to build systems that can develop and deploy ML models based on distributed healthcare data that must necessarily be accessed with privacy-preserving constraints. Further, the proposed architecture must be accompanied by a software framework that can address the needs of domain-specific data scientists to develop and augment ML models being deployed in their hospitals.This planning grant project is exploring the foundational principles necessary in building integrated scalable distributed systems of the future, so as to prepare for submitting a full proposal to the PPoSS program. It uses the domain of healthcare analytics to motivate and concretize the research agenda, but the principles developed in this research should be applicable to other application domains as well. The exploration focuses on demonstrating an integrated platform that spans multiple levels of distribution and heterogeneity of computation and storage, while also obeying important privacy constraints. While recent progress on the use of ML in healthcare applications has been encouraging, current approaches do not a) scale to the degrees of parallelism, heterogeneity, and distribution that will be required in future systems, or b) support the soft real-time responsiveness to streaming data that is needed in many clinical situations. The originality of this project can be seen in the integration of distribution, heterogeneity, and privacy considerations in a single unified software/hardware stack, which includes adaptive resource management that spans privacy-preserving federated continuous learning, automatic specialization of ML models at individual sites, and automatic selection of ML models best suited for specific clinical tasks that maximize accuracy subject to different latency and soft real-time constraints.This project’s end-to-end approach to develop foundational scalability principles will impact multiple areas of computer science through publications, tutorials and courses, thereby benefiting other researchers working on scalability challenges in future distributed heterogeneous systems. The use of healthcare analytics as a driving application has the potential to result in significant benefits to society, by demonstrating how knowledge distilled from multiple sources of data can be embodied in recommendation systems that can run onsite to provide time-critical decision support to physicians. As a further impact, the project will contribute to the training of Highly Qualified Personnel (HQP) at the intersection of Systems for ML and ML for Healthcare — two emerging inter-disciplinary communities that are currently growing independent of each other. Finally, this research will leverage existing activities at the PIs’ institutions that contribute to broadening participation of underrepresented groups in computing.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

人们普遍认为，用易于编程的软件构建未来可伸缩的分布式异构系统是一项巨大的挑战。人们普遍认为，随着处理器努力扩展并超越摩尔定律的终局，计算机系统的设计目前正在发生一场重大的颠覆。这种颠覆表现在所有规模(芯片上、芯片上、节点上、机架上、群集上和数据中心上)的新形式的异类和分布式处理器和内存中，这使得可扩展性成为所有级别的根本挑战。医疗保健分析为探索面向21世纪的可扩展系统设计提供了独特的机会，因为医疗机构捕获和存储医疗数据，甚至实时传输数据的能力发生了结构性变化。这一转变已经为机器学习(ML)模型的生态系统做出了贡献，这些模型正在为各种临床任务进行培训。需要一种新的分布式异构体系结构来构建系统，该系统可以基于分布式医疗数据开发和部署ML模型，这些数据必须在隐私保护约束下访问。此外，拟议的体系结构必须伴随着一个软件框架，该框架可以满足特定领域的数据科学家开发和增强其医院部署的ML模型的需求。该规划拨款项目正在探索构建未来集成的可扩展分布式系统所必需的基本原则，以便为向PPoSS计划提交完整的提案做准备。它使用医疗保健分析领域来激励和具体化研究议程，但这项研究中制定的原则也应该适用于其他应用领域。探索的重点是展示一个集成的平台，该平台跨越多个级别的分布式以及计算和存储的异构性，同时还遵守重要的隐私限制。虽然在医疗保健应用中使用ML的最新进展令人鼓舞，但目前的方法没有a)扩展到未来系统所需的并行性、异构性和分布性的程度，或b)支持许多临床情况下所需的对流数据的软实时响应。这个项目的独创性体现在一个统一的软件/硬件堆栈中集成了分布式、异构性和隐私考虑因素，其中包括跨越隐私保护的联合连续学习的自适应资源管理，单个站点的ML模型的自动专门化，以及自动选择最适合特定临床任务的ML模型，在不同的延迟和软实时约束下最大限度地提高准确性。该项目开发基本可伸缩性原则的端到端方法将通过出版物、教程和课程影响计算机科学的多个领域，从而使其他研究人员受益于未来分布式异质系统中的可伸缩性挑战。通过展示如何将从多个数据源提取的知识体现在可以现场运行的推荐系统中，为医生提供关键时间的决策支持，使用医疗分析作为驱动力应用程序可能会给社会带来显著的好处。作为进一步的影响，该项目将有助于在ML系统和ML医疗系统的交叉点培训高素质的人员(HQP)-这两个新兴的跨学科社区目前正在相互独立发展。最后，这项研究将利用PIS机构的现有活动，这些活动有助于扩大代表不足的群体在计算中的参与。这一奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（1）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

AID: Active Distillation Machine to Leverage Pre-Trained Black-Box Models in Private Data Settings

AID：主动蒸馏机在私人数据设置中利用预先训练的黑盒模型

DOI：
10.1145/3442381.3449944
发表时间：
2021
期刊：
The Web conference
影响因子：
0
作者：
Hoang, Trong Nghia;Hong, Shenda;Xiao, Cao;Low, Bryan;Sun, Jimeng
通讯作者：
Sun, Jimeng

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Jimeng Sun其他文献

Mining large graphs and streams using matrix and tensor tools

使用矩阵和张量工具挖掘大型图和流

DOI：
发表时间：
2007
期刊：
ACM SIGMOD Conference
影响因子：
0
作者：
C. Faloutsos;T. Kolda;Jimeng Sun
通讯作者：
Jimeng Sun

A perspective for adapting generalist AI to specialized medical AI applications and their challenges

将通用人工智能应用于专业医疗人工智能应用的视角及其挑战

DOI：
10.1038/s41746-025-01789-7
发表时间：
2025-07-11
期刊：
npj Digital Medicine
影响因子：
15.100
作者：
Zifeng Wang;Hanyin Wang;Benjamin Danek;Ying Li;Christina Mack;Luk Arbuckle;Devyani Biswal;Hoifung Poon;Yajuan Wang;Pranav Rajpurkar;Cao Xiao;Jimeng Sun
通讯作者：
Jimeng Sun

Disease-Specific Risk Prediction through Stability Selection using Electronic Health Records

使用电子健康记录通过稳定性选择来预测特定疾病的风险

DOI：
发表时间：
2012
期刊：
影响因子：
0
作者：
Jiayu Zhou;Jimeng Sun;Yashu Liu;Jianying Hu;Jieping Ye
通讯作者：
Jieping Ye

Recent Advances in Predictive Modeling with Electronic Health Records

电子健康记录预测建模的最新进展

DOI：
10.48550/arxiv.2402.01077
发表时间：
2024
期刊：
ArXiv
影响因子：
0
作者：
Jiaqi Wang;Junyu Luo;Muchao Ye;Xiaochen Wang;Yuan Zhong;Aofei Chang;Guanjie Huang;Ziyi Yin;Cao Xiao;Jimeng Sun;Fenglong Ma
通讯作者：
Fenglong Ma