III: Small: Applying Relational Database Design Principles to Machine Learning System Design
三:小:将关系数据库设计原理应用于机器学习系统设计
基本信息
- 批准号:2008240
- 负责人:
- 金额:$ 50万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-10-01 至 2024-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Modern machine learning systems such as TensorFlow and PyTorch have revolutionized the development of machine learning models, making it possible to produce new and complex models in a short period of time. However, these systems have significant limitations. They are difficult to use for training large models over large data sets. A user must manually map computations and data to hardware in a distributed setting. Doing so incorrectly leads to system failure. Models cannot easily be decomposed and trained in parallel across different hardware infrastructures. When a user wishes to speed learning by adding more machines/workers, the result will in many cases be a longer training time. This project considers the fundamental question: What is the foundation upon which machine learning systems should be built so that they can easily facilitate distributed training of the largest models over the largest data sets?The project will investigate use of the relational model as the basis for machine learning system design, where matrices and tensors are decomposed and stored in relations. The relational model has long been the basis for database systems, which successfully process huge data sets using large clusters of machines. However, a number of research problems need to be addressed for relational systems to be the preferred platform for machine learning system implementation. First, there are many ways that a tensor can be decomposed and stored in a relation. There are complex interactions between the data representation, the computation being run, and the mapping of the computation to hardware. How can all of these be co-optimized? Second, unlike classical relational computations, machine learning computations are iterative, repeating the same computation many times. The compute kernels (matrix multiplications, convolutions, etc.) are very expensive, making cost-based optimization difficult. The project will investigate an entirely new paradigm for relational optimization, where rather than being statically optimized, a relational computation that is executed iteratively is treated as a Markov decision process that must be optimized over its lifetime of executions, so as to achieve minimum cost over all executions. Finally, when machine learning computations are expressed relationally, the underlying tuples store pieces of decomposed tensors. Those tuples are very large, and have constraints such as continuity of keys that are not present in general, relational computations. The project will investigate the use of optimization-based relational algorithms that use those constraints to carefully place the data and plan communication so as to minimize the communication of such large objects.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
TensorFlow和PyTorch等现代机器学习系统彻底改变了机器学习模型的发展,使其能够在短时间内生成新的复杂模型。然而,这些系统具有显著的局限性。它们很难用于在大型数据集上训练大型模型。用户必须手动地将计算和数据映射到分布式设置中的硬件。不正确的操作会导致系统故障。模型不能轻易地在不同的硬件基础设施上并行分解和训练。当用户希望通过添加更多机器/工人来加速学习时,在许多情况下,结果将是更长的培训时间。该项目考虑的基本问题是:机器学习系统应该建立在什么基础上,以便它们可以轻松地在最大的数据集上进行最大模型的分布式训练?该项目将研究使用关系模型作为机器学习系统设计的基础,其中矩阵和张量被分解并存储在关系中。关系模型长期以来一直是数据库系统的基础,它使用大型机器集群成功地处理巨大的数据集。然而,一些研究问题需要解决的关系系统是机器学习系统实现的首选平台。首先,有很多方法可以将张量分解并存储在关系中。在数据表示、正在运行的计算以及计算到硬件的映射之间存在复杂的交互。所有这些如何才能共同优化?其次,与经典的关系计算不同,机器学习计算是迭代的,多次重复相同的计算。计算内核(矩阵乘法,卷积等)非常昂贵,使得基于成本的优化变得困难。该项目将研究一种全新的关系优化范式,而不是静态优化,迭代执行的关系计算被视为马尔可夫决策过程,必须在其执行生命周期内进行优化,以便在所有执行中实现最小成本。最后,当机器学习计算以关系方式表达时,底层元组存储分解的张量片段。这些元组非常大,并且具有诸如键的连续性之类的约束,这些约束在一般的关系计算中不存在。该项目将研究基于优化的关系算法的使用,该算法使用这些约束来仔细放置数据和计划通信,以最大限度地减少此类大型对象的通信。该奖项反映了NSF的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(6)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Tensor Relational Algebra for Distributed Machine Learning System Design
- DOI:10.14778/3457390.3457399
- 发表时间:2020-09
- 期刊:
- 影响因子:0
- 作者:Binhang Yuan;Dimitrije Jankov;Jia Zou;Yu-Shuen Tang;Daniel Bourgeois;C. Jermaine
- 通讯作者:Binhang Yuan;Dimitrije Jankov;Jia Zou;Yu-Shuen Tang;Daniel Bourgeois;C. Jermaine
Distributed Learning of Fully Connected Neural Networks using Independent Subnet Training
- DOI:10.14778/3529337.3529343
- 发表时间:2019-10
- 期刊:
- 影响因子:0
- 作者:Binhang Yuan;Anastasios Kyrillidis;C. Jermaine
- 通讯作者:Binhang Yuan;Anastasios Kyrillidis;C. Jermaine
Auto-Differentiation of Relational Computations for Very Large Scale Machine Learning
- DOI:10.48550/arxiv.2306.00088
- 发表时间:2023-05
- 期刊:
- 影响因子:0
- 作者:Yu-Shuen Tang;Zhimin Ding;Dimitrije Jankov;Binhang Yuan;Daniel Bourgeois;C. Jermaine
- 通讯作者:Yu-Shuen Tang;Zhimin Ding;Dimitrije Jankov;Binhang Yuan;Daniel Bourgeois;C. Jermaine
Distributed Numerical and Machine Learning Computations via Two-Phase Execution of Aggregated Join Trees
- DOI:10.14778/3450980.3450991
- 发表时间:2021-03
- 期刊:
- 影响因子:0
- 作者:Dimitrije Jankov;Binhang Yuan;Shangyu Luo;C. Jermaine
- 通讯作者:Dimitrije Jankov;Binhang Yuan;Shangyu Luo;C. Jermaine
Automatic Optimization of Matrix Implementations for Distributed Machine Learning and Linear Algebra
- DOI:10.1145/3448016.3457317
- 发表时间:2021-06
- 期刊:
- 影响因子:0
- 作者:Shangyu Luo;Dimitrije Jankov;Binhang Yuan;C. Jermaine
- 通讯作者:Shangyu Luo;Dimitrije Jankov;Binhang Yuan;C. Jermaine
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Christopher Jermaine其他文献
Exploring phylogenetic hypotheses via Gibbs sampling on evolutionary networks
通过进化网络上的吉布斯采样探索系统发育假设
- DOI:
- 发表时间:
2016 - 期刊:
- 影响因子:4.4
- 作者:
Yun Yu;Christopher Jermaine;Luay K. Nakhleh - 通讯作者:
Luay K. Nakhleh
The Latent Community Model for Detecting Sybil Attacks in Social Networks
用于检测社交网络中女巫攻击的潜在社区模型
- DOI:
- 发表时间:
2011 - 期刊:
- 影响因子:0
- 作者:
Zhuhua Cai;Christopher Jermaine - 通讯作者:
Christopher Jermaine
Maintaining very large random samples using the geometric file
- DOI:
10.1007/s00778-007-0048-z - 发表时间:
2007-05-11 - 期刊:
- 影响因子:3.800
- 作者:
Abhijit Pol;Christopher Jermaine;Subramanian Arumugam - 通讯作者:
Subramanian Arumugam
Christopher Jermaine的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Christopher Jermaine', 18)}}的其他基金
Collaborative Research: SHF: Medium: Semantics-Aware Neural Models of Code
合作研究:SHF:媒介:代码的语义感知神经模型
- 批准号:
2212557 - 财政年份:2022
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Collaborative Research: CISE-MSI: RPEP: III: celtSTEM Research Collaborative: Catapulting MSI Faculty and Students into Computational Research.
合作研究:CISE-MSI:RPEP:III:celtSTEM 研究合作:将 MSI 教师和学生推向计算研究。
- 批准号:
2131294 - 财政年份:2021
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
MLWiNS: Wireless On-the-Edge Training of Deep Networks Using Independent Subnets
MLWiNS:使用独立子网的深度网络无线边缘训练
- 批准号:
2003137 - 财政年份:2020
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Expeditions: Collaborative Research: Understanding the World Through Code
探险:合作研究:通过代码了解世界
- 批准号:
1918651 - 财政年份:2020
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
III: Small: Declarative Recursive Computation on a Database System
III:小型:数据库系统上的声明式递归计算
- 批准号:
1910803 - 财政年份:2019
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
ABI Innovation: Algorithms and Models for Distributed Computation of Bayesian Phylogenetics
ABI Innovation:贝叶斯系统发育分布式计算算法和模型
- 批准号:
1355998 - 财政年份:2014
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
III: Medium: SimSQL: A Database System Supporting Implementation and Execution of Distributed Machine Learning Codes
III:媒介:SimSQL:支持分布式机器学习代码实现和执行的数据库系统
- 批准号:
1409543 - 财政年份:2014
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
III: Medium: Collaborative Research: Data Mining and Cleaning for Medical Data Warehouses
III:媒介:协作研究:医疗数据仓库的数据挖掘和清理
- 批准号:
0964526 - 财政年份:2010
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
III-COR-Medium: Design and Implementation of the DBO Database System
III-COR-Medium:DBO数据库系统的设计与实现
- 批准号:
1007062 - 财政年份:2009
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
Small: The MCDB Database System for Managing and Modeling Uncertainty
小:用于管理和建模不确定性的 MCDB 数据库系统
- 批准号:
0915315 - 财政年份:2009
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
相似国自然基金
昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
- 批准号:
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
- 批准号:n/a
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
- 批准号:32000033
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
- 批准号:31972324
- 批准年份:2019
- 资助金额:58.0 万元
- 项目类别:面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
- 批准号:81900988
- 批准年份:2019
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
- 批准号:31802058
- 批准年份:2018
- 资助金额:26.0 万元
- 项目类别:青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
- 批准号:31870821
- 批准年份:2018
- 资助金额:56.0 万元
- 项目类别:面上项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
- 批准号:31772128
- 批准年份:2017
- 资助金额:60.0 万元
- 项目类别:面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
- 批准号:81704176
- 批准年份:2017
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
- 批准号:91640114
- 批准年份:2016
- 资助金额:85.0 万元
- 项目类别:重大研究计划
相似海外基金
Collaborative Research: Applying Ion-Exchange Chromatography-Supercritical Fluid Chromatography to Small Molecule Analysis
合作研究:离子交换色谱-超临界流体色谱在小分子分析中的应用
- 批准号:
1904454 - 财政年份:2019
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Collaborative Research: Applying Ion-Exchange Chromatography-Supercritical Fluid Chromatography to Small Molecule Analysis
合作研究:离子交换色谱-超临界流体色谱在小分子分析中的应用
- 批准号:
1904919 - 财政年份:2019
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Development of a Human Taste-Small Intestine-Brain Correlation Model applying for Nutritional Physiology
应用于营养生理学的人类味觉-小肠-大脑相关模型的开发
- 批准号:
19K22988 - 财政年份:2019
- 资助金额:
$ 50万 - 项目类别:
Grant-in-Aid for Challenging Research (Exploratory)
RI: Small: Applying discrete reasoning steps in solving natural language processing tasks
RI:小:应用离散推理步骤解决自然语言处理任务
- 批准号:
1814522 - 财政年份:2018
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
CHS: Small: Applying Intergroup Psychology to Overcome Barriers in Human-Robot Interaction
CHS:小:应用群际心理学克服人机交互中的障碍
- 批准号:
1617611 - 财政年份:2016
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
CSR: Small: Adaptively Applying Data-Driven Execution Mode to Remove I/O Bottleneck for Data-Intensive Computing
CSR:小:自适应应用数据驱动执行模式,消除数据密集型计算的 I/O 瓶颈
- 批准号:
1217948 - 财政年份:2012
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
SHF: Small: Developing and Applying Reuse Distance Analysis Techniques for Large-Scale Multicore Processors
SHF:小型:开发和应用大规模多核处理器的重用距离分析技术
- 批准号:
1117042 - 财政年份:2011
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Investigation of mechanism of load pulsation, and applying for forming of parts having small thickness
负载脉动机理的研究,应用于小厚度零件的成形
- 批准号:
22760555 - 财政年份:2010
- 资助金额:
$ 50万 - 项目类别:
Grant-in-Aid for Young Scientists (B)
HCC-Small: Investigating and Supporting the Iterative and Exploratory Process of Applying Statistical Machine Learning
HCC-Small:调查和支持应用统计机器学习的迭代和探索过程
- 批准号:
0812590 - 财政年份:2008
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Model age determination for small geological units of the Moon by applying CSFD method and/or DL method to a large amount of image data
通过对大量图像数据应用 CSFD 方法和/或 DL 方法来确定月球小地质单元的模型年龄
- 批准号:
20540416 - 财政年份:2008
- 资助金额:
$ 50万 - 项目类别:
Grant-in-Aid for Scientific Research (C)