CAREER: Robust, scalable, reliable machine learning

职业:稳健、可扩展、可靠的机器学习

基本信息

  • 批准号:
    1750286
  • 负责人:
  • 金额:
    $ 55万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-03-15 至 2024-02-29
  • 项目状态:
    已结题

项目摘要

Machine learning is increasingly deployed in large-scale, mission critical problems for the purpose of making decisions that affect a vast number of individuals' employment, savings, health, and safety. The potential for machine learning to dramatically impact and change people's lives necessitates that machine learning methods be robust, explainable, and understandable---rather than black-box. This research develops new techniques that are both computationally motivated and theoretically sound for robust machine learning at scale. The work is situated in the context of three modern classes of applications. (1) Economists are interested in analyzing the efficacy of microcredit, small loans to individuals in impoverished areas with the goal of eliminating poverty. (2) Biologists are interested in using single-cell RNA sequencing data to understand cells' relationships and development trajectories. (3) The Internet of Things (IoT) is poised to generate a wealth of complex data across energy readings in buildings, within transportation infrastructure, from vehicles on the road, and from many other sensor sources. The PI is working directly with area experts so as to have immediate, broad impact across application domains. In an educational component of the project, the PI is a core part of developing a new graduate curriculum and degree in statistics, data science, and statistical machine learning at MIT. The methods and applications in this proposal feature in a new course on modern machine learning methods. The PI is also developing a high-school level introduction to machine learning as part of the established Women's Technology Program (WTP).The issues of robustness and explainability particularly arise in domains with nontrivial spatial and temporal dependencies, where the amount of data is often massive, and where practitioners typically have some expert knowledge about the domain before engaging with a particular dataset. These are precisely the domains where existing machine learning methodologies are less well-developed. The need to bring structural knowledge to bear on the problem suggests the use of Bayesian methods, which can incorporate this knowledge via prior and modeling assumptions. To live up to the promise of these methods, though, practical approaches need to be robust to assumptions as well as to noisy or adversarial data, lest this data change important decisions in ways not understood by the practitioner. This research incorporates advances in statistical physics to assess the sensitivity of a data analysis to assumptions and data values. And to realize the advantages of the proposed robust and understandable machine learning framework, practitioners must face extreme scalability issues---both from a computational perspective as well as a modeling perspective. On the computational side, this research builds on recent advances from computational geometry to scale to data sets at modern sizes. On the modeling side, note that while small-scale problems exhibit dense spatio-temporal dependencies, large-scale problems tend to be sparser, and practical approaches must reflect this sparsity to be reliable at scale. This work incorporates advances in probability theory to model sparse IoT networks. This proposal is highly interdisciplinary---bringing together ideas from machine learning, statistics, physics, theoretical computer science, probability theory, and systems and applying these ideas to microcredit, single-cell RNA sequencing, sensor networks, international trade, and industrial applications including customer service at scale.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
机器学习越来越多地部署在大规模,任务关键问题中,目的是制定影响大量个人就业,储蓄,健康和安全的决策。机器学习产生巨大影响和改变人们生活的潜力必须使机器学习方法具有强大的,可解释的和可以理解的 - 而不是黑盒。这项研究开发了新技术,这些技术既是计算动机的,又是理论上的声音,可以大规模地进行机器学习。这项工作位于三个现代类别的应用程序的背景下。 (1)经济学家有兴趣分析微货币,小额贷款给贫困地区的个人的效力,目的是消除贫困。 (2)生物学家有兴趣使用单细胞RNA测序数据来了解细胞的关系和发育轨迹。 (3)物联网(IoT)有望在建筑物,运输基础设施中,道路上的车辆以及许多其他传感器来源的建筑物中的能源读数中产生大量复杂的数据。 PI直接与区域专家合作,以便在应用程序领域之间立即产生广泛的影响。在该项目的一个教育部分中,PI是MIT开发新的研究生课程和学位的新研究生课程和学位的核心部分。此提案中的方法和应用在现代机器学习方法的新课程中。作为既定女性技术计划(WTP)的一部分,PI还正在开发高中级的机器学习介绍。在具有非客气空间和时间依赖性的领域中尤其是在数据量中,在这些域通常具有质量大量的域名,以及实践者通常在与特定数据集合之前具有一定的专业知识。这些正是现有的机器学习方法不那么发达的领域。带来结构知识以解决问题的需求表明,贝叶斯方法可以通过先验和建模假设结合这些知识。但是,要履行这些方法的承诺,实用方法对于假设以及嘈杂或对抗性数据需要坚固,以免这些数据以从业者无法理解的方式改变重要决策。这项研究结合了统计物理学的进步,以评估数据分析对假设和数据值的敏感性。为了实现提出的强大和可理解的机器学习框架的优势,从业者必须面对极端的可扩展性问题 - 从计算的角度和建模的角度来看。在计算方面,这项研究是基于从计算几何形状到缩放到现代尺寸的数据集的最新进展。在建模方面,请注意,尽管小规模问题表现出密集的时空依赖性,但大规模的问题往往更稀疏,并且实际方法必须反映出这种稀疏性在规模上可靠。这项工作结合了概率理论的进步,以建模稀疏的物联网网络。该建议是高度跨学科的---将机器学习,统计,物理学,理论计算机科学,概率理论和系统中的想法汇总到微货币,单细胞RNA测序,传感器网络,国际贸易,国际贸易和工业应用中,包括规模上的客户奖,这些奖项通过评估NSF的构建范围,包括智力和范围的范围。 标准。

项目成果

期刊论文数量(18)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Scalable Gaussian Process Inference with Finite-data Mean and Variance Guarantees
具有有限数据均值和方差保证的可扩展高斯过程推理
  • DOI:
    10.48550/arxiv.1806.10234
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Huggins Jonathan H.
  • 通讯作者:
    Huggins Jonathan H.
The Kernel Interaction Trick: Fast Bayesian Discovery of Pairwise Interactions in High Dimensions
  • DOI:
  • 发表时间:
    2019-05
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Raj Agrawal;Jonathan Huggins;Brian L. Trippe;Tamara Broderick
  • 通讯作者:
    Raj Agrawal;Jonathan Huggins;Brian L. Trippe;Tamara Broderick
Validated Variational Inference via Practical Posterior Error Bounds
  • DOI:
  • 发表时间:
    2019-10
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jonathan Huggins;Mikolaj Kasprzak;Trevor Campbell;Tamara Broderick
  • 通讯作者:
    Jonathan Huggins;Mikolaj Kasprzak;Trevor Campbell;Tamara Broderick
Local exchangeability
本地可交换性
  • DOI:
    10.3150/22-bej1533
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    1.5
  • 作者:
    Campbell, Trevor;Syed, Saifuddin;Yang, Chiao-Yu;Jordan, Michael I.;Broderick, Tamara
  • 通讯作者:
    Broderick, Tamara
Scaled Process Priors for Bayesian Nonparametric Estimation of the Unseen Genetic Variation
未见遗传变异的贝叶斯非参数估计的缩放过程先验
  • DOI:
    10.1080/01621459.2022.2115918
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    3.7
  • 作者:
    Camerlenghi, Federico;Favaro, Stefano;Masoero, Lorenzo;Broderick, Tamara
  • 通讯作者:
    Broderick, Tamara
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Tamara Broderick其他文献

Return of the Infinitesimal Jackknife
无穷小折刀的归来
  • DOI:
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Ryan Giordano;William T. Stephenson;Runjing Liu;Michael I. Jordan;Tamara Broderick
  • 通讯作者:
    Tamara Broderick
Truncated random measures
截断随机测量
  • DOI:
  • 发表时间:
    2016
  • 期刊:
  • 影响因子:
    1.5
  • 作者:
    Trevor Campbell;Jonathan Huggins;J. How;Tamara Broderick
  • 通讯作者:
    Tamara Broderick
Redshift Accuracy Requirements for Future Supernova and Number Count Surveys
未来超新星和计数巡天的红移精度要求
  • DOI:
    10.1086/424726
  • 发表时间:
    2004
  • 期刊:
  • 影响因子:
    0
  • 作者:
    D. Huterer;A. Kim;L. Krauss;Tamara Broderick
  • 通讯作者:
    Tamara Broderick
Comment: Nonparametric Bayes Modeling of Populations of Networks
Covariance Matrices and Influence Scores for Mean Field Variational Bayes
平均场变分贝叶斯的协方差矩阵和影响分数
  • DOI:
  • 发表时间:
    2015
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Ryan Giordano;Tamara Broderick
  • 通讯作者:
    Tamara Broderick

Tamara Broderick的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Tamara Broderick', 18)}}的其他基金

Collaborative Research: PPoSS: Planning: Scalable Systems for Probabilistic Programming
协作研究:PPoSS:规划:概率编程的可扩展系统
  • 批准号:
    2029016
  • 财政年份:
    2020
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Workshop for Women in Machine Learning
机器学习女性研讨会
  • 批准号:
    1833154
  • 财政年份:
    2018
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant

相似国自然基金

强壮前沟藻共生细菌降解膦酸酯产生促藻效应的分子机制
  • 批准号:
    42306167
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
高效率强壮消息鉴别码的分析与设计
  • 批准号:
    61202422
  • 批准年份:
    2012
  • 资助金额:
    23.0 万元
  • 项目类别:
    青年科学基金项目
半定松弛与非凸二次约束二次规划研究
  • 批准号:
    11271243
  • 批准年份:
    2012
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
基于复合编码脉冲串的水下主动隐蔽性探测新方法研究
  • 批准号:
    61271414
  • 批准年份:
    2012
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
民航客运网络收益管理若干问题的研究
  • 批准号:
    60776817
  • 批准年份:
    2007
  • 资助金额:
    20.0 万元
  • 项目类别:
    联合基金项目

相似海外基金

CAREER: Scalable and Robust Uncertainty Quantification using Subsampling Markov Chain Monte Carlo Algorithms
职业:使用子采样马尔可夫链蒙特卡罗算法进行可扩展且稳健的不确定性量化
  • 批准号:
    2340586
  • 财政年份:
    2024
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
CAREER: Towards Scalable and Robust Inference of Phylogenetic Networks
职业:走向可扩展和稳健的系统发育网络推理
  • 批准号:
    2144367
  • 财政年份:
    2022
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
CAREER: Scalable and Robust Dynamic Matching Market Design
职业:可扩展且稳健的动态匹配市场设计
  • 批准号:
    1846237
  • 财政年份:
    2019
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
CAREER: Robust and scalable genome-wide phylogenetics
职业:稳健且可扩展的全基因组系统发育学
  • 批准号:
    1845967
  • 财政年份:
    2019
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
CAREER: Leveraging Combinatorial Structures for Robust and Scalable Learning
职业:利用组合结构实现稳健且可扩展的学习
  • 批准号:
    1845032
  • 财政年份:
    2019
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了