BIGDATA: F: DKA: Collaborative Research: Theory and Algorithms for Parallel Probabilistic Inference with Big Data, via Big Model, in Realistic Distributed Computing Environments
BIGDATA:F:DKA:协作研究:在现实分布式计算环境中通过大模型进行大数据并行概率推理的理论和算法
基本信息
- 批准号:1447676
- 负责人:
- 金额:$ 50万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-09-01 至 2018-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This project develops a new framework that enables machine learning (ML) systems to automatically comprehend and mine massive and complex data via parallel Bayesian inference on large computer clusters. The research has a profound impact on the practice and direction of Big Learning. The developed technologies have a catalytic effect on both ML research and applications: ML scientists are able to rapidly experiment on novel, cutting-edge ML models with minimal programming effort, unhindered by the limitations of single machines. Researchers from other fields, like biology and social sciences, are able to run contemporary advanced ML methods that transcend the capabilities of simple models, yielding new scientific insights on data whose size would otherwise be daunting. Data scientists at small start-ups are able to conduct ML analytics with complex models, putting their capabilities on par with huge companies possessing dedicated engineering and infrastructure teams. Students and beginners are able to witness distributed ML in action with just a few lines of code, driving ML education to new heights. Technically, this research focuses on scaling up and parallelizing Bayesian machine learning, which provides a powerful, elegant and theoretically justified framework for modeling a wide variety of datasets. The research team develops a suite of complementary distributed inference algorithms for hierarchical Bayesian models, which cover most commonly used Bayesian ML methods. The project focuses on combining speed and scalability with theoretical guarantees that allow us to assess the accuracy of the resulting methods, and allow practitioners to make trade-offs between speed and accuracy. Rather than focus on a few disconnected models, the project develops techniques applicable to a broad spectrum of hierarchical Bayesian models, resulting in a toolkit of building blocks that can be combined as needed for arbitrary probabilistic models - be they parametric or nonparametric, discriminative or generative. This is in contrast to much existing work on parallel inference, which tends to focus on parallelization in a specific model and cannot be easily extended. The project provides a solid algorithmic foundation for learning on Big Data with powerful models. The research contributes to democratizing advanced and large-scale ML methods for broad applications, by offering the user and developer community a library of general-purpose parallelizable algorithms for working on diverse problems using computer clusters and the cloud, bridging the gap between practical needs from data and basic research in ML.
该项目开发了一个新的框架,使机器学习(ML)系统能够通过大型计算机集群上的并行贝叶斯推理自动理解和挖掘大量复杂的数据。该研究对大学习的实践和方向产生了深远的影响。开发的技术对机器学习研究和应用都有催化作用:机器学习科学家能够以最小的编程工作量快速实验新颖,尖端的机器学习模型,不受单个机器的限制。来自其他领域(如生物学和社会科学)的研究人员能够运行超越简单模型能力的当代高级ML方法,对数据产生新的科学见解,否则这些数据的规模将令人望而生畏。小型初创企业的数据科学家能够用复杂的模型进行机器学习分析,使他们的能力与拥有专门工程和基础设施团队的大公司不相上下。学生和初学者能够用几行代码见证分布式ML的作用,将ML教育推向新的高度。从技术上讲,本研究的重点是扩展和并行贝叶斯机器学习,它为各种数据集的建模提供了一个强大、优雅和理论上合理的框架。研究小组为分层贝叶斯模型开发了一套互补的分布式推理算法,涵盖了最常用的贝叶斯机器学习方法。该项目关注于将速度和可伸缩性与理论保证结合起来,使我们能够评估结果方法的准确性,并允许从业者在速度和准确性之间进行权衡。该项目不是专注于几个不相关的模型,而是开发了适用于广泛层次贝叶斯模型的技术,从而形成了一个构建模块工具包,可以根据需要组合成任意概率模型-无论是参数模型还是非参数模型,判别模型还是生成模型。这与许多现有的并行推理工作形成对比,这些工作倾向于关注特定模型中的并行化,并且不容易扩展。该项目以强大的模型为大数据学习提供了坚实的算法基础。该研究为用户和开发者社区提供了一个通用并行算法库,用于使用计算机集群和云处理各种问题,弥合了ML中数据和基础研究的实际需求之间的差距,从而为广泛应用的先进和大规模ML方法的民主化做出了贡献。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Eric Xing其他文献
What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions
您的数据对 GPT 有何价值?
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Sang Keun Choe;Hwijeen Ahn;Juhan Bae;Kewen Zhao;Minsoo Kang;Youngseog Chung;Adithya Pratapa;W. Neiswanger;Emma Strubell;Teruko Mitamura;Jeff Schneider;Eduard Hovy;Roger Grosse;Eric Xing - 通讯作者:
Eric Xing
Applications of artificial intelligence in public health: analyzing the built environment and addressing spatial inequities
- DOI:
10.1007/s10389-025-02444-x - 发表时间:
2025-03-19 - 期刊:
- 影响因子:1.600
- 作者:
Ana Luiza Favarão Leão;Bernard Banda;Eric Xing;Sanketh Gudapati;Adeel Ahmad;Jonathan Lin;Srikumar Sastry;Nathan Jacobs;Rodrigo Siqueira Reis - 通讯作者:
Rodrigo Siqueira Reis
An exploratory study of self-supervised pre-training on partially supervised multi-label classification on chest X-ray images
胸部X射线图像部分监督多标签分类自监督预训练的探索性研究
- DOI:
10.1016/j.asoc.2024.111855 - 发表时间:
2024 - 期刊:
- 影响因子:8.7
- 作者:
Nanqing Dong;Michael Kampffmeyer;Haoyang Su;Eric Xing - 通讯作者:
Eric Xing
Eric Xing的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Eric Xing', 18)}}的其他基金
III: Small: Multiple Device Collaborative Learning in Real Heterogeneous and Dynamic Environments
III:小:真实异构动态环境中的多设备协作学习
- 批准号:
2311990 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
ML Basis for Intelligence Augmentation:Toward Personalized Modeling, Reasoning under Data-Knowledge Symbiosis, and Interpretable Interaction for AI-assisted Human Decision-making
智能增强的机器学习基础:面向人工智能辅助人类决策的个性化建模、数据知识共生下的推理和可解释的交互
- 批准号:
2040381 - 财政年份:2021
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
Collaborative Research: SCH: Trustworthy and Explainable AI for Neurodegenerative Diseases
合作研究:SCH:值得信赖且可解释的人工智能治疗神经退行性疾病
- 批准号:
2123952 - 财政年份:2021
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
CNS Core: Small: Toward Globally-Optimal Resource Distribution and Computation Acceleration in Multi-Tenant and Heterogeneous Machine Learning Systems
CNS 核心:小型:在多租户和异构机器学习系统中实现全局最优资源分配和计算加速
- 批准号:
2008248 - 财政年份:2020
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
III: Small: A New Approach to Latent Space Learning with Diversity-Inducing Regularization and Applications to Healthcare Data Analytics
III:小型:具有多样性诱导正则化的潜在空间学习新方法及其在医疗保健数据分析中的应用
- 批准号:
1617583 - 财政年份:2016
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
XPS: FULL: Broad-Purpose, Aggressively Asynchronous and Theoretically Sound Parallel Large-scale Machine Learning
XPS:FULL:用途广泛、积极异步且理论上合理的并行大规模机器学习
- 批准号:
1629559 - 财政年份:2016
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
III: Small: Collaborative Research: Efficient, Nonparametric and Local-Minimum-Free Latent Variable Models: With Application to Large-Scale Computer Vision and Genomics
III:小型:协作研究:高效、非参数和局部最小自由潜变量模型:应用于大规模计算机视觉和基因组学
- 批准号:
1218282 - 财政年份:2012
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
III: Small: Collaborative Research: Using Large-Scale Image Data for Online Social Media Analysis
III:小:协作研究:使用大规模图像数据进行在线社交媒体分析
- 批准号:
1115313 - 财政年份:2011
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Collaborative Research: Discovering and Exploiting Latent Communities in Social Media
协作研究:发现和利用社交媒体中的潜在社区
- 批准号:
1111142 - 财政年份:2011
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Indexing, Mining and Modeling Spatio-Temporal Patterns of Gene Expressions
基因表达时空模式的索引、挖掘和建模
- 批准号:
0640543 - 财政年份:2007
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
相似国自然基金
HIV-1逆转录酶/整合酶双重抑制剂DKA-DAPYs的分子设计、合成及抗HIV活性研究
- 批准号:21402148
- 批准年份:2014
- 资助金额:25.0 万元
- 项目类别:青年科学基金项目
相似海外基金
BIGDATA: F: DKA: Collaborative Research: Randomized Numerical Linear Algebra (RandNLA) for multi-linear and non-linear data
BIGDATA:F:DKA:协作研究:用于多线性和非线性数据的随机数值线性代数 (RandNLA)
- 批准号:
1661760 - 财政年份:2016
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: High-Dimensional Statistical Machine Learning for Spatio-Temporal Climate Data
BIGDATA:F:DKA:协作研究:时空气候数据的高维统计机器学习
- 批准号:
1664720 - 财政年份:2016
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: Structured Nearest Neighbor Search in High Dimensions
BIGDATA:F:DKA:协作研究:高维结构化最近邻搜索
- 批准号:
1447473 - 财政年份:2015
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: Structured Nearest Neighbor Search in High Dimensions
BIGDATA:F:DKA:协作研究:高维结构化最近邻搜索
- 批准号:
1447413 - 财政年份:2015
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: Structured Nearest Neighbor Search in High Dimensions
BIGDATA:F:DKA:协作研究:高维结构化最近邻搜索
- 批准号:
1447476 - 财政年份:2015
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: Randomized Numerical Linear Algebra (RandNLA) for multi-linear and non-linear data
BIGDATA:F:DKA:协作研究:用于多线性和非线性数据的随机数值线性代数 (RandNLA)
- 批准号:
1447283 - 财政年份:2014
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: Dealing Efficiently with Big Social Network Data
BIGDATA:F:DKA:协作研究:有效处理社交网络大数据
- 批准号:
1447554 - 财政年份:2014
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
BIGDATA: IA: DKA: Collaborative Research: High-Thoughput Connectomics
大数据:IA:DKA:协作研究:高通量连接组学
- 批准号:
1447786 - 财政年份:2014
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: High-Dimensional Statistical Machine Learning for Spatio-Temporal Climate Data
BIGDATA:F:DKA:协作研究:时空气候数据的高维统计机器学习
- 批准号:
1447566 - 财政年份:2014
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: High-Dimensional Statistical Machine Learning for Spatio-Temporal Climate Data
BIGDATA:F:DKA:协作研究:时空气候数据的高维统计机器学习
- 批准号:
1447574 - 财政年份:2014
- 资助金额:
$ 50万 - 项目类别:
Standard Grant