BIGDATA: F: DKA: Collaborative Research: Theory and Algorithms for Parallel Probabilistic Inference with Big Data, via Big Model, in Realistic Distributed Computing Environments
BIGDATA:F:DKA:协作研究:在现实分布式计算环境中通过大模型进行大数据并行概率推理的理论和算法
基本信息
- 批准号:1447721
- 负责人:
- 金额:$ 30万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-09-01 至 2018-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This project develops a new framework that enables machine learning (ML) systems to automatically comprehend and mine massive and complex data via parallel Bayesian inference on large computer clusters. The research has a profound impact on the practice and direction of Big Learning. The developed technologies have a catalytic effect on both ML research and applications: ML scientists are able to rapidly experiment on novel, cutting-edge ML models with minimal programming effort, unhindered by the limitations of single machines. Researchers from other fields, like biology and social sciences, are able to run contemporary advanced ML methods that transcend the capabilities of simple models, yielding new scientific insights on data whose size would otherwise be daunting. Data scientists at small start-ups are able to conduct ML analytics with complex models, putting their capabilities on par with huge companies possessing dedicated engineering and infrastructure teams. Students and beginners are able to witness distributed ML in action with just a few lines of code, driving ML education to new heights. Technically, this research focuses on scaling up and parallelizing Bayesian machine learning, which provides a powerful, elegant and theoretically justified framework for modeling a wide variety of datasets. The research team develops a suite of complementary distributed inference algorithms for hierarchical Bayesian models, which cover most commonly used Bayesian ML methods. The project focuses on combining speed and scalability with theoretical guarantees that allow us to assess the accuracy of the resulting methods, and allow practitioners to make trade-offs between speed and accuracy. Rather than focus on a few disconnected models, the project develops techniques applicable to a broad spectrum of hierarchical Bayesian models, resulting in a toolkit of building blocks that can be combined as needed for arbitrary probabilistic models - be they parametric or nonparametric, discriminative or generative. This is in contrast to much existing work on parallel inference, which tends to focus on parallelization in a specific model and cannot be easily extended. The project provides a solid algorithmic foundation for learning on Big Data with powerful models. The research contributes to democratizing advanced and large-scale ML methods for broad applications, by offering the user and developer community a library of general-purpose parallelizable algorithms for working on diverse problems using computer clusters and the cloud, bridging the gap between practical needs from data and basic research in ML.
该项目开发了一个新的框架,使机器学习(ML)系统能够通过大型计算机集群上的并行贝叶斯推理自动理解和挖掘大量复杂的数据。这一研究对大学习的实践和方向有着深远的影响。开发的技术对机器学习研究和应用都有催化作用:机器学习科学家能够以最小的编程工作量快速实验新颖的尖端机器学习模型,不受单机限制的影响。来自生物学和社会科学等其他领域的研究人员能够运行超越简单模型能力的当代先进ML方法,从而对数据产生新的科学见解,否则这些数据的规模将是令人生畏的。小型初创企业的数据科学家能够使用复杂的模型进行机器学习分析,使他们的能力与拥有专门工程和基础设施团队的大型公司相当。学生和初学者只需几行代码就可以见证分布式ML的实际应用,将ML教育推向新的高度。从技术上讲,这项研究的重点是扩展和并行贝叶斯机器学习,它为各种数据集建模提供了一个强大,优雅和理论合理的框架。 研究团队为分层贝叶斯模型开发了一套互补的分布式推理算法,其中涵盖了最常用的贝叶斯ML方法。该项目的重点是将速度和可扩展性与理论保证相结合,使我们能够评估结果方法的准确性,并允许从业者在速度和准确性之间进行权衡。该项目不是专注于几个断开的模型,而是开发适用于广泛的分层贝叶斯模型的技术,从而产生一个构建模块的工具包,可以根据需要组合为任意概率模型-无论是参数还是非参数,判别式还是生成式。这与许多现有的并行推理工作形成对比,并行推理往往专注于特定模型中的并行化,并且无法轻松扩展。该项目为使用强大的模型学习大数据提供了坚实的算法基础。该研究有助于使先进的大规模ML方法民主化,以实现广泛的应用,为用户和开发人员社区提供通用并行算法库,用于使用计算机集群和云处理各种问题,弥合数据实际需求与ML基础研究之间的差距。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Sinead Williamson其他文献
ANOVA exemplars for understanding data drift
- DOI:
- 发表时间:
2020-06 - 期刊:
- 影响因子:0
- 作者:
Sinead Williamson - 通讯作者:
Sinead Williamson
Nonparametric Network Models for Link Prediction
- DOI:
- 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
Sinead Williamson - 通讯作者:
Sinead Williamson
Accelerated parallel non-conjugate sampling for Bayesian non-parametric models
贝叶斯非参数模型的加速并行非共轭采样
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:2.2
- 作者:
M. Zhang;Sinead Williamson;F. Pérez - 通讯作者:
F. Pérez
Restricted Indian buffet processes
受限制的印度自助餐流程
- DOI:
10.1007/s11222-016-9681-y - 发表时间:
2015 - 期刊:
- 影响因子:2.2
- 作者:
F. Doshi;Sinead Williamson - 通讯作者:
Sinead Williamson
Slice sampling normalized kernel-weighted completely random measure mixture models
切片采样归一化核加权完全随机测量混合模型
- DOI:
- 发表时间:
2012 - 期刊:
- 影响因子:0
- 作者:
N. Foti;Sinead Williamson - 通讯作者:
Sinead Williamson
Sinead Williamson的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
HIV-1逆转录酶/整合酶双重抑制剂DKA-DAPYs的分子设计、合成及抗HIV活性研究
- 批准号:21402148
- 批准年份:2014
- 资助金额:25.0 万元
- 项目类别:青年科学基金项目
相似海外基金
BIGDATA: F: DKA: Collaborative Research: Randomized Numerical Linear Algebra (RandNLA) for multi-linear and non-linear data
BIGDATA:F:DKA:协作研究:用于多线性和非线性数据的随机数值线性代数 (RandNLA)
- 批准号:
1661760 - 财政年份:2016
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: High-Dimensional Statistical Machine Learning for Spatio-Temporal Climate Data
BIGDATA:F:DKA:协作研究:时空气候数据的高维统计机器学习
- 批准号:
1664720 - 财政年份:2016
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: Structured Nearest Neighbor Search in High Dimensions
BIGDATA:F:DKA:协作研究:高维结构化最近邻搜索
- 批准号:
1447473 - 财政年份:2015
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: Structured Nearest Neighbor Search in High Dimensions
BIGDATA:F:DKA:协作研究:高维结构化最近邻搜索
- 批准号:
1447413 - 财政年份:2015
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: Structured Nearest Neighbor Search in High Dimensions
BIGDATA:F:DKA:协作研究:高维结构化最近邻搜索
- 批准号:
1447476 - 财政年份:2015
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: Randomized Numerical Linear Algebra (RandNLA) for multi-linear and non-linear data
BIGDATA:F:DKA:协作研究:用于多线性和非线性数据的随机数值线性代数 (RandNLA)
- 批准号:
1447283 - 财政年份:2014
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: Dealing Efficiently with Big Social Network Data
BIGDATA:F:DKA:协作研究:有效处理社交网络大数据
- 批准号:
1447554 - 财政年份:2014
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
BIGDATA: IA: DKA: Collaborative Research: High-Thoughput Connectomics
大数据:IA:DKA:协作研究:高通量连接组学
- 批准号:
1447786 - 财政年份:2014
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: High-Dimensional Statistical Machine Learning for Spatio-Temporal Climate Data
BIGDATA:F:DKA:协作研究:时空气候数据的高维统计机器学习
- 批准号:
1447566 - 财政年份:2014
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: High-Dimensional Statistical Machine Learning for Spatio-Temporal Climate Data
BIGDATA:F:DKA:协作研究:时空气候数据的高维统计机器学习
- 批准号:
1447574 - 财政年份:2014
- 资助金额:
$ 30万 - 项目类别:
Standard Grant