XPS: FULL: Broad-Purpose, Aggressively Asynchronous and Theoretically Sound Parallel Large-scale Machine Learning
XPS:FULL:用途广泛、积极异步且理论上合理的并行大规模机器学习
基本信息
- 批准号:1629559
- 负责人:
- 金额:$ 62.54万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2016
- 资助国家:美国
- 起止时间:2016-09-01 至 2020-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Many artificial intelligence (AI) applications such as image understanding and natural language processing rely on Machine Learning (ML) methods to automatically extract valuable knowledge from Big Data (Big Learning). Efficient ML requires not only expertise in advanced mathematical models and algorithms, but also experiences with large computer clusters where issues such as machine failures, memory/network bottlenecks, inter-machine latencies must be properly handled through complex system programming. Such demand on "dual skill" often prevents democratizing large-scale AI to wide user communities, and necessitates a new framework that bridges ML and the distributed computing environment of a cluster with a single-machine-like simple interface, allowing ML practitioners to be agnostic about the backend details, and able to quickly prototype or deploy ML programs on clusters. Solutions to such a need remain rare. In this project the PIs develop a new general purpose framework for ML on distributed systems, offering highly efficient and theoretically justified protocols (e.g. communication, scheduling, and partitioning functions) to orchestrate a heterogeneous computer cluster to become programmable and act like a single big computer, and execute distributed ML programs correctly and at a speed orders of magnitude faster than current systems such as Hadoop and Spark. With this new framework, data scientists will be able to conduct ML analytics with complex models on massive data without the need for dedicated engineering and infrastructure teams, allowing Big Learning more readily accessible to society. Specifically, over a four year span, the proposed research focuses on three technical aims: (1) Building a System Framework for Big Learning, by developing a new architecture that supports both data- and model-parallel execution of large ML programs, using intelligent scheduler, parameter server, and consistency controller that are configurable to provide flexible options for model/data parallelization, synchronization schemes, load balance, fault tolerance, and multi-instance tenancy; (2) Building a Multi-Level-Abstraction Programming Interface, which supports easy parallel programming of both basic and advanced ML algorithms for large-scale applications; and (3)Conducting theoretical analysis of distributed ML algorithms on the proposed system, based on unique insights such as block consistency and error-tolerance under bounded synchronism. The goal is to develop a system framework to achieve general, automatic, and effective parallelization of ML programs.
许多人工智能(AI)应用,如图像理解和自然语言处理,都依赖于机器学习(ML)方法来自动从大数据(Big Learning)中提取有价值的知识。高效的ML不仅需要高级数学模型和算法方面的专业知识,还需要处理大型计算机集群的经验,在这些集群中,机器故障、内存/网络瓶颈、机器间延迟等问题必须通过复杂的系统编程得到适当处理。这种“双重技能”的需求往往阻碍了大规模人工智能向广泛的用户社区的民主化,并需要一个新的框架来连接ML和集群的分布式计算环境,使用单机般的简单界面,允许ML实践者对后端细节不可知,并能够在集群上快速制作或部署ML程序的原型。针对这种需求的解决方案仍然很少。在这个项目中,PI为分布式系统上的ML开发了一个新的通用框架,提供了高效和理论上合理的协议(例如,通信、调度和分区功能)来协调不同的计算机集群,使其变得可编程并像一台大型计算机一样运行,并正确地执行分布式ML程序,并且速度比当前的系统(如Hadoop和Spark)快几个数量级。有了这个新的框架,数据科学家将能够使用复杂的模型对海量数据进行ML分析,而不需要专门的工程和基础设施团队,使Big Learning更容易为社会所接受。具体地说,在四年的时间里,建议的研究集中在三个技术目标上:(1)通过开发一个支持大型ML程序的数据并行和模型并行执行的新体系结构,使用可配置的智能调度器、参数服务器和一致性控制器,为模型/数据并行化、同步方案、负载平衡、容错和多实例租用提供灵活的选项,从而构建一个用于大学习的系统框架;(2)构建一个多层次抽象编程接口,它支持大规模应用程序的基本和高级ML算法的简单并行编程;(3)基于有界同步下的块一致性和容错性等独特见解,对分布式最大似然算法进行了理论分析。目标是开发一个系统框架,以实现ML程序的通用、自动和有效的并行化。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Eric Xing其他文献
What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions
您的数据对 GPT 有何价值?
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Sang Keun Choe;Hwijeen Ahn;Juhan Bae;Kewen Zhao;Minsoo Kang;Youngseog Chung;Adithya Pratapa;W. Neiswanger;Emma Strubell;Teruko Mitamura;Jeff Schneider;Eduard Hovy;Roger Grosse;Eric Xing - 通讯作者:
Eric Xing
Applications of artificial intelligence in public health: analyzing the built environment and addressing spatial inequities
- DOI:
10.1007/s10389-025-02444-x - 发表时间:
2025-03-19 - 期刊:
- 影响因子:1.600
- 作者:
Ana Luiza Favarão Leão;Bernard Banda;Eric Xing;Sanketh Gudapati;Adeel Ahmad;Jonathan Lin;Srikumar Sastry;Nathan Jacobs;Rodrigo Siqueira Reis - 通讯作者:
Rodrigo Siqueira Reis
An exploratory study of self-supervised pre-training on partially supervised multi-label classification on chest X-ray images
胸部X射线图像部分监督多标签分类自监督预训练的探索性研究
- DOI:
10.1016/j.asoc.2024.111855 - 发表时间:
2024 - 期刊:
- 影响因子:8.7
- 作者:
Nanqing Dong;Michael Kampffmeyer;Haoyang Su;Eric Xing - 通讯作者:
Eric Xing
Eric Xing的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Eric Xing', 18)}}的其他基金
III: Small: Multiple Device Collaborative Learning in Real Heterogeneous and Dynamic Environments
III:小:真实异构动态环境中的多设备协作学习
- 批准号:
2311990 - 财政年份:2023
- 资助金额:
$ 62.54万 - 项目类别:
Standard Grant
ML Basis for Intelligence Augmentation:Toward Personalized Modeling, Reasoning under Data-Knowledge Symbiosis, and Interpretable Interaction for AI-assisted Human Decision-making
智能增强的机器学习基础:面向人工智能辅助人类决策的个性化建模、数据知识共生下的推理和可解释的交互
- 批准号:
2040381 - 财政年份:2021
- 资助金额:
$ 62.54万 - 项目类别:
Continuing Grant
Collaborative Research: SCH: Trustworthy and Explainable AI for Neurodegenerative Diseases
合作研究:SCH:值得信赖且可解释的人工智能治疗神经退行性疾病
- 批准号:
2123952 - 财政年份:2021
- 资助金额:
$ 62.54万 - 项目类别:
Standard Grant
CNS Core: Small: Toward Globally-Optimal Resource Distribution and Computation Acceleration in Multi-Tenant and Heterogeneous Machine Learning Systems
CNS 核心:小型:在多租户和异构机器学习系统中实现全局最优资源分配和计算加速
- 批准号:
2008248 - 财政年份:2020
- 资助金额:
$ 62.54万 - 项目类别:
Standard Grant
III: Small: A New Approach to Latent Space Learning with Diversity-Inducing Regularization and Applications to Healthcare Data Analytics
III:小型:具有多样性诱导正则化的潜在空间学习新方法及其在医疗保健数据分析中的应用
- 批准号:
1617583 - 财政年份:2016
- 资助金额:
$ 62.54万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: Theory and Algorithms for Parallel Probabilistic Inference with Big Data, via Big Model, in Realistic Distributed Computing Environments
BIGDATA:F:DKA:协作研究:在现实分布式计算环境中通过大模型进行大数据并行概率推理的理论和算法
- 批准号:
1447676 - 财政年份:2014
- 资助金额:
$ 62.54万 - 项目类别:
Standard Grant
III: Small: Collaborative Research: Efficient, Nonparametric and Local-Minimum-Free Latent Variable Models: With Application to Large-Scale Computer Vision and Genomics
III:小型:协作研究:高效、非参数和局部最小自由潜变量模型:应用于大规模计算机视觉和基因组学
- 批准号:
1218282 - 财政年份:2012
- 资助金额:
$ 62.54万 - 项目类别:
Continuing Grant
III: Small: Collaborative Research: Using Large-Scale Image Data for Online Social Media Analysis
III:小:协作研究:使用大规模图像数据进行在线社交媒体分析
- 批准号:
1115313 - 财政年份:2011
- 资助金额:
$ 62.54万 - 项目类别:
Standard Grant
Collaborative Research: Discovering and Exploiting Latent Communities in Social Media
协作研究:发现和利用社交媒体中的潜在社区
- 批准号:
1111142 - 财政年份:2011
- 资助金额:
$ 62.54万 - 项目类别:
Standard Grant
Indexing, Mining and Modeling Spatio-Temporal Patterns of Gene Expressions
基因表达时空模式的索引、挖掘和建模
- 批准号:
0640543 - 财政年份:2007
- 资助金额:
$ 62.54万 - 项目类别:
Continuing Grant
相似国自然基金
钴基Full-Heusler合金的掺杂效应和薄膜噪声特性研究
- 批准号:51871067
- 批准年份:2018
- 资助金额:60.0 万元
- 项目类别:面上项目
相似海外基金
Human-Robot Co-Evolution: Achieving the full potential of future workplaces
人机协同进化:充分发挥未来工作场所的潜力
- 批准号:
DP240100938 - 财政年份:2024
- 资助金额:
$ 62.54万 - 项目类别:
Discovery Projects
SAFER - Secure Foundations: Verified Systems Software Above Full-Scale Integrated Semantics
SAFER - 安全基础:高于全面集成语义的经过验证的系统软件
- 批准号:
EP/Y035976/1 - 财政年份:2024
- 资助金额:
$ 62.54万 - 项目类别:
Research Grant
Collaborative Research: NSFGEO-NERC: Advancing capabilities to model ultra-low velocity zone properties through full waveform Bayesian inversion and geodynamic modeling
合作研究:NSFGEO-NERC:通过全波形贝叶斯反演和地球动力学建模提高超低速带特性建模能力
- 批准号:
2341238 - 财政年份:2024
- 资助金额:
$ 62.54万 - 项目类别:
Standard Grant
CAREER: Informed Testing — From Full-Field Characterization of Mechanically Graded Soft Materials to Student Equity in the Classroom
职业:知情测试 – 从机械分级软材料的全场表征到课堂上的学生公平
- 批准号:
2338371 - 财政年份:2024
- 资助金额:
$ 62.54万 - 项目类别:
Standard Grant
CAREER: From Flamelet to Full-Scale: Advancing Plasma-Assisted Combustion for Low-Emission Sustainable Fuels
职业生涯:从小火焰到全面:推进低排放可持续燃料的等离子体辅助燃烧
- 批准号:
2339518 - 财政年份:2024
- 资助金额:
$ 62.54万 - 项目类别:
Continuing Grant
STTR Phase II: Dermatologist-level detection of suspicious pigmented skin lesions from high-resolution full-body images
STTR II 期:通过高分辨率全身图像对可疑色素性皮肤病变进行皮肤科医生级别的检测
- 批准号:
2335086 - 财政年份:2024
- 资助金额:
$ 62.54万 - 项目类别:
Cooperative Agreement
Toward carbon-neutral society: Development of a full-sustainable eco-friendly green mining process for gold recovery
迈向碳中和社会:开发完全可持续的环保绿色采矿工艺以回收黄金
- 批准号:
24K17540 - 财政年份:2024
- 资助金额:
$ 62.54万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Collaborative Research: NSFGEO-NERC: Advancing capabilities to model ultra-low velocity zone properties through full waveform Bayesian inversion and geodynamic modeling
合作研究:NSFGEO-NERC:通过全波形贝叶斯反演和地球动力学建模提高超低速带特性建模能力
- 批准号:
2341237 - 财政年份:2024
- 资助金额:
$ 62.54万 - 项目类别:
Continuing Grant
All Analogue Full-duplex Dual-receiver Radio for Wideband Mm-wave Communications
用于宽带毫米波通信的全模拟全双工双接收器无线电
- 批准号:
EP/X041581/1 - 财政年份:2024
- 资助金额:
$ 62.54万 - 项目类别:
Research Grant
Full mitigation of birefringence for high-precision optical experiments
完全缓解双折射,实现高精度光学实验
- 批准号:
24K00649 - 财政年份:2024
- 资助金额:
$ 62.54万 - 项目类别:
Grant-in-Aid for Scientific Research (B)