CAREER: Robust, scalable, reliable machine learning

职业:稳健、可扩展、可靠的机器学习

基本信息

  • 批准号:
    1750286
  • 负责人:
  • 金额:
    $ 55万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-03-15 至 2024-02-29
  • 项目状态:
    已结题

项目摘要

Machine learning is increasingly deployed in large-scale, mission critical problems for the purpose of making decisions that affect a vast number of individuals' employment, savings, health, and safety. The potential for machine learning to dramatically impact and change people's lives necessitates that machine learning methods be robust, explainable, and understandable---rather than black-box. This research develops new techniques that are both computationally motivated and theoretically sound for robust machine learning at scale. The work is situated in the context of three modern classes of applications. (1) Economists are interested in analyzing the efficacy of microcredit, small loans to individuals in impoverished areas with the goal of eliminating poverty. (2) Biologists are interested in using single-cell RNA sequencing data to understand cells' relationships and development trajectories. (3) The Internet of Things (IoT) is poised to generate a wealth of complex data across energy readings in buildings, within transportation infrastructure, from vehicles on the road, and from many other sensor sources. The PI is working directly with area experts so as to have immediate, broad impact across application domains. In an educational component of the project, the PI is a core part of developing a new graduate curriculum and degree in statistics, data science, and statistical machine learning at MIT. The methods and applications in this proposal feature in a new course on modern machine learning methods. The PI is also developing a high-school level introduction to machine learning as part of the established Women's Technology Program (WTP).The issues of robustness and explainability particularly arise in domains with nontrivial spatial and temporal dependencies, where the amount of data is often massive, and where practitioners typically have some expert knowledge about the domain before engaging with a particular dataset. These are precisely the domains where existing machine learning methodologies are less well-developed. The need to bring structural knowledge to bear on the problem suggests the use of Bayesian methods, which can incorporate this knowledge via prior and modeling assumptions. To live up to the promise of these methods, though, practical approaches need to be robust to assumptions as well as to noisy or adversarial data, lest this data change important decisions in ways not understood by the practitioner. This research incorporates advances in statistical physics to assess the sensitivity of a data analysis to assumptions and data values. And to realize the advantages of the proposed robust and understandable machine learning framework, practitioners must face extreme scalability issues---both from a computational perspective as well as a modeling perspective. On the computational side, this research builds on recent advances from computational geometry to scale to data sets at modern sizes. On the modeling side, note that while small-scale problems exhibit dense spatio-temporal dependencies, large-scale problems tend to be sparser, and practical approaches must reflect this sparsity to be reliable at scale. This work incorporates advances in probability theory to model sparse IoT networks. This proposal is highly interdisciplinary---bringing together ideas from machine learning, statistics, physics, theoretical computer science, probability theory, and systems and applying these ideas to microcredit, single-cell RNA sequencing, sensor networks, international trade, and industrial applications including customer service at scale.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
机器学习越来越多地部署在大规模的、使命关键问题中,以做出影响大量个人就业、储蓄、健康和安全的决策。机器学习有可能极大地影响和改变人们的生活,这就要求机器学习方法必须是健壮的、可解释的和可理解的,而不是黑箱。这项研究开发了新技术,这些技术既以计算为动力,又在理论上适合大规模鲁棒的机器学习。这项工作是位于三个现代类的应用程序的背景下。(1)经济学家们对分析小额信贷的有效性很感兴趣,小额贷款给贫困地区的个人,目的是消除贫困。(2)生物学家有兴趣使用单细胞RNA测序数据来了解细胞的关系和发育轨迹。(3)物联网(IoT)有望在建筑物、交通基础设施、道路上的车辆以及许多其他传感器源的能源读数中生成大量复杂的数据。PI直接与领域专家合作,以便在各个应用领域产生直接、广泛的影响。在该项目的教育部分中,PI是麻省理工学院开发新的统计学,数据科学和统计机器学习研究生课程和学位的核心部分。该提案中的方法和应用是现代机器学习方法的新课程的特色。PI还正在开发一个高中水平的机器学习介绍,作为既定的妇女技术计划(WTP)的一部分。鲁棒性和可解释性问题特别出现在具有重要空间和时间依赖性的领域,这些领域的数据量通常是巨大的,从业者通常在使用特定数据集之前对该领域有一些专业知识。这些正是现有机器学习方法不太发达的领域。需要把结构知识来承担的问题,建议使用贝叶斯方法,它可以通过事先和建模假设,将这些知识。然而,为了实现这些方法的承诺,实际方法需要对假设以及噪声或对抗性数据具有鲁棒性,以免这些数据以从业者不理解的方式改变重要决策。这项研究结合了统计物理学的进步,以评估数据分析对假设和数据值的敏感性。为了实现所提出的鲁棒和可理解的机器学习框架的优点,从业者必须面对极端的可扩展性问题-无论是从计算的角度还是从建模的角度。在计算方面,这项研究建立在从计算几何到现代规模数据集的最新进展的基础上。在建模方面,请注意,虽然小规模的问题表现出密集的时空依赖性,但大规模的问题往往比较稀疏,实际的方法必须反映这种稀疏性才能在规模上可靠。这项工作结合了概率理论的进步来建模稀疏物联网网络。这个提议是高度跨学科的--汇集了机器学习、统计学、物理学、理论计算机科学、概率论和系统的思想,并将这些思想应用于小额信贷、单细胞RNA测序、传感器网络、国际贸易、该奖项反映了NSF的法定使命,并被认为值得通过使用基金会的学术价值和更广泛的影响审查标准。

项目成果

期刊论文数量(18)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Scalable Gaussian Process Inference with Finite-data Mean and Variance Guarantees
具有有限数据均值和方差保证的可扩展高斯过程推理
  • DOI:
    10.48550/arxiv.1806.10234
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Huggins Jonathan H.
  • 通讯作者:
    Huggins Jonathan H.
The Kernel Interaction Trick: Fast Bayesian Discovery of Pairwise Interactions in High Dimensions
  • DOI:
  • 发表时间:
    2019-05
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Raj Agrawal;Jonathan Huggins;Brian L. Trippe;Tamara Broderick
  • 通讯作者:
    Raj Agrawal;Jonathan Huggins;Brian L. Trippe;Tamara Broderick
Validated Variational Inference via Practical Posterior Error Bounds
  • DOI:
  • 发表时间:
    2019-10
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jonathan Huggins;Mikolaj Kasprzak;Trevor Campbell;Tamara Broderick
  • 通讯作者:
    Jonathan Huggins;Mikolaj Kasprzak;Trevor Campbell;Tamara Broderick
Evaluating Sensitivity to the Stick-Breaking Prior in Bayesian Nonparametrics
  • DOI:
    10.1214/22-ba1309
  • 发表时间:
    2018-10
  • 期刊:
  • 影响因子:
    4.4
  • 作者:
    Runjing Liu;Ryan Giordano;Michael I. Jordan;Tamara Broderick
  • 通讯作者:
    Runjing Liu;Ryan Giordano;Michael I. Jordan;Tamara Broderick
For high-dimensional hierarchical models, consider exchangeability of effects across covariates instead of across datasets
  • DOI:
  • 发表时间:
    2021-07
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Brian L. Trippe;H. Finucane;Tamara Broderick
  • 通讯作者:
    Brian L. Trippe;H. Finucane;Tamara Broderick
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Tamara Broderick其他文献

Redshift Accuracy Requirements for Future Supernova and Number Count Surveys
未来超新星和计数巡天的红移精度要求
  • DOI:
    10.1086/424726
  • 发表时间:
    2004
  • 期刊:
  • 影响因子:
    0
  • 作者:
    D. Huterer;A. Kim;L. Krauss;Tamara Broderick
  • 通讯作者:
    Tamara Broderick
Comment: Nonparametric Bayes Modeling of Populations of Networks
Variational Bayes for Merging Noisy Databases
用于合并噪声数据库的变分贝叶斯
  • DOI:
  • 发表时间:
    2014
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Tamara Broderick;R. Steorts
  • 通讯作者:
    R. Steorts
Covariance Matrices and Influence Scores for Mean Field Variational Bayes
平均场变分贝叶斯的协方差矩阵和影响分数
  • DOI:
  • 发表时间:
    2015
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Ryan Giordano;Tamara Broderick
  • 通讯作者:
    Tamara Broderick
Covariance Matrices for Mean Field Variational Bayes
平均场变分贝叶斯的协方差矩阵
  • DOI:
  • 发表时间:
    2014
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Ryan Giordano;Tamara Broderick
  • 通讯作者:
    Tamara Broderick

Tamara Broderick的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Tamara Broderick', 18)}}的其他基金

Collaborative Research: PPoSS: Planning: Scalable Systems for Probabilistic Programming
协作研究:PPoSS:规划:概率编程的可扩展系统
  • 批准号:
    2029016
  • 财政年份:
    2020
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Workshop for Women in Machine Learning
机器学习女性研讨会
  • 批准号:
    1833154
  • 财政年份:
    2018
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant

相似国自然基金

供应链管理中的稳健型(Robust)策略分析和稳健型优化(Robust Optimization )方法研究
  • 批准号:
    70601028
  • 批准年份:
    2006
  • 资助金额:
    7.0 万元
  • 项目类别:
    青年科学基金项目
心理紧张和应力影响下Robust语音识别方法研究
  • 批准号:
    60085001
  • 批准年份:
    2000
  • 资助金额:
    14.0 万元
  • 项目类别:
    专项基金项目
ROBUST语音识别方法的研究
  • 批准号:
    69075008
  • 批准年份:
    1990
  • 资助金额:
    3.5 万元
  • 项目类别:
    面上项目
改进型ROBUST序贯检测技术
  • 批准号:
    68671030
  • 批准年份:
    1986
  • 资助金额:
    2.0 万元
  • 项目类别:
    面上项目

相似海外基金

ERI: Robust and Scalable Manufacturing of Ultra-Sensitive and Selective Molecule Sensor Arrays
ERI:稳健且可扩展的超灵敏和选择性分子传感器阵列制造
  • 批准号:
    2301668
  • 财政年份:
    2024
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
CAREER: Scalable and Robust Uncertainty Quantification using Subsampling Markov Chain Monte Carlo Algorithms
职业:使用子采样马尔可夫链蒙特卡罗算法进行可扩展且稳健的不确定性量化
  • 批准号:
    2340586
  • 财政年份:
    2024
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
EAGER: Quantum Manufacturing: Supporting Future Quantum Applications by Developing a Robust, Scalable Process to Create Diamond Nitrogen-Vacancy Center Qubits
EAGER:量子制造:通过开发稳健、可扩展的工艺来创建钻石氮空位中心量子位,支持未来的量子应用
  • 批准号:
    2242049
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Collaborative Research: SaTC: CORE: Small: Towards Robust, Scalable, and Resilient Radio Fingerprinting
协作研究:SaTC:核心:小型:迈向稳健、可扩展和有弹性的无线电指纹识别
  • 批准号:
    2225161
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Developing robust and scalable genomics tools and databases to analyze immune receptor repertoires across diverse populations
开发强大且可扩展的基因组学工具和数据库来分析不同人群的免疫受体库
  • 批准号:
    10656981
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
Collaborative Research: CISE-MSI: DP: RI: Towards Scalable, Resilient and Robust Foraging with Heterogeneous Robot Swarms
合作研究:CISE-MSI:DP:RI:利用异构机器人群实现可扩展、有弹性和稳健的觅食
  • 批准号:
    2318682
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Collaborative Research: U.S.-Ireland R&D Partnership: CIF: AF: Small: Enabling Beyond-5G Wireless Access Networks with Robust and Scalable Cell-Free Massive MIMO
合作研究:美国-爱尔兰 R
  • 批准号:
    2322191
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Collaborative Research: CISE-MSI: DP: RI: Towards Scalable, Resilient and Robust Foraging with Heterogeneous Robot Swarms
合作研究:CISE-MSI:DP:RI:利用异构机器人群实现可扩展、有弹性和稳健的觅食
  • 批准号:
    2318683
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Robust and scalable algorithms for learning hidden structures in sparse network data with the aid of side information
借助辅助信息学习稀疏网络数据中隐藏结构的鲁棒且可扩展的算法
  • 批准号:
    2311024
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Collaborative Research: U.S.-Ireland R&D Partnership: CIF: AF: Small: Enabling Beyond-5G Wireless Access Networks with Robust and Scalable Cell-Free Massive MIMO
合作研究:美国-爱尔兰 R
  • 批准号:
    2322190
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了