CAREER: Coresets for Robust and Efficient Machine Learning

职业:稳健高效的机器学习核心集

基本信息

项目摘要

Large datasets have enabled modern machine learning models to achieve unprecedented success in various applications, ranging from medical diagnostics to urban planning and autonomous driving, to name a few. However, learning from massive data is contingent on exceptionally large and expensive computational resources. Such infrastructures consume substantial energy, produce a massive amount of carbon footprint, and often soon become obsolete and turn into e-waste. While there has been a persistent effort to improve the performance and reliability of machine learning models, their sustainability is often neglected. This project aims to address sustainability, reliability, and efficiency of machine learning, by selecting the most relevant data for training. The resulting algorithms will be broadly applicable for learning from massive datasets across a wide range of applications, such as medical diagnosis and environment sensing. The outcomes of this research will be incorporated into curriculum development, to train a new generation of machine learning and data mining practitioners.The main objective of this project is to develop a new generation of theoretically rigorous methods that enable {efficient and robust learning from massive datasets. To achieve this goal, this project will develop scalable combinatorial optimization algorithms to extract weighted subsets (coresets) of data that guarantee similar training dynamics to that of training on the full data. This enables sustainable, efficient, and accurate learning from massive data. As datasets grow larger, maintaining their quality becomes very expensive. Hence, mislabeled and malicious examples become ubiquitous in large datasets. To ensure reliability in addition to sustainability and efficiency, the developed techniques will be leveraged to extract coresets of data points that enable provably robust learning against noisy labels and adversarial attacks. This project will also seek to learn better objectives to automatically extract the most valuable data for efficient and robust learning from massive data. Finally, this research will enable efficient and robust learning frameworks that can be applied to many real-world applications through interdisciplinary collaborations.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
大型数据集使现代机器学习模型在各种应用中取得了前所未有的成功,从医疗诊断到城市规划和自动驾驶等等。然而,从海量数据中学习取决于异常庞大和昂贵的计算资源。这些基础设施消耗大量能源,产生大量碳足迹,而且往往很快就会过时,变成电子废物。虽然人们一直在努力提高机器学习模型的性能和可靠性,但它们的可持续性往往被忽视。该项目旨在通过选择最相关的数据进行训练来解决机器学习的可持续性,可靠性和效率。由此产生的算法将广泛适用于从各种应用中的海量数据集进行学习,例如医疗诊断和环境传感。该项目的主要目标是开发新一代理论上严格的方法,使{从海量数据集进行高效和鲁棒的学习。为了实现这一目标,该项目将开发可扩展的组合优化算法,以提取数据的加权子集(核心集),从而保证与完整数据的训练动态相似的训练动态。这使得从海量数据中进行可持续、高效和准确的学习成为可能。随着数据集越来越大,维护其质量变得非常昂贵。因此,错误标记和恶意示例在大型数据集中变得无处不在。除了可持续性和效率之外,为了确保可靠性,开发的技术将被用来提取数据点的核心集,这些核心集能够证明对噪声标签和对抗性攻击的鲁棒学习。该项目还将寻求更好的学习目标,以自动提取最有价值的数据,从海量数据中进行高效和强大的学习。最后,这项研究将使高效和强大的学习框架,可以应用到许多现实世界的应用,通过跨学科的合作。这个奖项反映了NSF的法定使命,并已被认为是值得通过评估使用基金会的智力价值和更广泛的影响审查标准的支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Baharan Mirzasoleimanbarzi其他文献

Baharan Mirzasoleimanbarzi的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

Data Reduction and Large-Scale Inference - Bayesian Coresets
数据缩减和大规模推理 - 贝叶斯核心集
  • 批准号:
    2592814
  • 财政年份:
    2021
  • 资助金额:
    $ 52.94万
  • 项目类别:
    Studentship
NSFSaTC-BSF: TWC: Small: Enabling Secure and Private Cloud Computing using Coresets
NFSaTC-BSF:TWC:小型:使用核心集实现安全和私有云计算
  • 批准号:
    1526815
  • 财政年份:
    2015
  • 资助金额:
    $ 52.94万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了