CAREER: accelerating machine learning with low dimensional structure
职业:利用低维结构加速机器学习
基本信息
- 批准号:1943131
- 负责人:
- 金额:$ 55万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-10-01 至 2022-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Big datasets are everywhere: in science, in health, in commerce, and in government, data is becoming easier and cheaper to collect. Yet extracting value from this data is a challenge; every step requires human intervention: cleaning the data, identifying useful features, and choosing a machine learning model. The goal of this project is to develop new methods to accelerate and automate the basic machine learning (ML) workflow. Automation frees data scientists from data cleaning and parameter twiddling to concentrate on the important questions: are we solving the right problems, and do we have the right data? This project will help democratize machine learning and promote data-driven decision making by developing automated methods to clean data and to choose ML models, including open source software packages, that make these methods widely available and easy to use. The project also advances these goals by training data scientists in how to use these models and understand their potential risks. Low dimensional structure provides the key to meeting the diverse challenges required to automate machine learning. This project relies on the central insight is that measurements of a complex object, such as a patient in a hospital, respondent on a survey, or even a ML dataset, can be well described as simple functions (or even linear functions) of an underlying low dimensional latent vector. The project develops new algorithms and software to identify low dimensional latent vectors and to use them to a) clean the data by denoising observations or imputing missing entries, b) reduce the dimensionality of feature vectors, and c) recommend better algorithms. This project will develop new techniques to identify low dimensional latent vectors from sparse observations via nonlinear (even, discontinuous) functions, with efficient algorithms and with theoretical guarantees. To enable more efficient automated machine, the project will develop methods localize similar datasets near each other in a low dimensional space, so that nearness in this space predicts similar performance of machine learning methods.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
大数据集无处不在:在科学、健康、商业和政府领域,收集数据变得越来越容易、越来越便宜。然而,从这些数据中提取价值却是一项挑战;每一步都需要人为干预:清理数据、识别有用的特征以及选择机器学习模型。该项目的目标是开发新方法来加速和自动化基本的机器学习(ML)工作流程。自动化将数据科学家从数据清理和参数调整中解放出来,专注于重要的问题:我们是否正在解决正确的问题,以及我们是否拥有正确的数据?该项目将通过开发自动化方法来清理数据和选择ML模型,包括开源软件包,使这些方法广泛可用且易于使用,从而帮助实现机器学习的民主化,并促进数据驱动的决策。该项目还通过培训数据科学家如何使用这些模型并了解其潜在风险来推进这些目标。低维结构为应对机器学习自动化所需的各种挑战提供了关键。这个项目依赖于一个核心的观点,即复杂对象的测量,比如医院里的病人,调查中的受访者,甚至是ML数据集,都可以很好地描述为底层低维潜在向量的简单函数(甚至是线性函数)。该项目开发了新的算法和软件来识别低维潜在向量,并使用它们来a)通过对观察结果进行降噪或估算缺失条目来清理数据,B)降低特征向量的维度,以及c)推荐更好的算法。该项目将开发新的技术,通过非线性(偶,不连续)函数从稀疏观测中识别低维潜在向量,并提供有效的算法和理论保证。为了实现更高效的自动化机器,该项目将开发在低维空间中定位彼此接近的相似数据集的方法,以便在该空间中的接近度预测机器学习方法的相似性能。该奖项反映了NSF的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估而被认为值得支持。
项目成果
期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Robust Non-Linear Matrix Factorization for Dictionary Learning, Denoising, and Clustering
- DOI:10.1109/tsp.2021.3062988
- 发表时间:2020-05
- 期刊:
- 影响因子:5.4
- 作者:Jicong Fan;Chengrun Yang;Madeleine Udell
- 通讯作者:Jicong Fan;Chengrun Yang;Madeleine Udell
Low-Rank Tucker Approximation of a Tensor From Streaming Data
- DOI:10.1137/19m1257718
- 发表时间:2019-04
- 期刊:
- 影响因子:0
- 作者:Yiming Sun;Yang Guo;Charlene Luo;J. Tropp;Madeleine Udell
- 通讯作者:Yiming Sun;Yang Guo;Charlene Luo;J. Tropp;Madeleine Udell
Randomized Sketching Algorithms for Low-Memory Dynamic Optimization
- DOI:10.1137/19m1272561
- 发表时间:2021-01
- 期刊:
- 影响因子:0
- 作者:R. Muthukumar;D. Kouri;Madeleine Udell
- 通讯作者:R. Muthukumar;D. Kouri;Madeleine Udell
Scalable Semidefinite Programming
- DOI:10.1137/19m1305045
- 发表时间:2019-12
- 期刊:
- 影响因子:0
- 作者:A. Yurtsever;J. Tropp;Olivier Fercoq;Madeleine Udell;V. Cevher
- 通讯作者:A. Yurtsever;J. Tropp;Olivier Fercoq;Madeleine Udell;V. Cevher
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Madeleine Udell其他文献
kFW: A Frank-Wolfe style algorithm with stronger subproblem oracles
kFW:具有更强子问题预言能力的 Frank-Wolfe 风格算法
- DOI:
- 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Lijun Ding;Jicong Fan;Madeleine Udell - 通讯作者:
Madeleine Udell
OptiMUS: Optimization Modeling Using MIP Solvers and large language models
OptiMUS:使用 MIP 求解器和大型语言模型进行优化建模
- DOI:
10.48550/arxiv.2310.06116 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Ali AhmadiTeshnizi;Wenzhi Gao;Madeleine Udell - 通讯作者:
Madeleine Udell
Big Data is Low Rank
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Madeleine Udell - 通讯作者:
Madeleine Udell
Scalable Approximate Optimal Diagonal Preconditioning
可扩展的近似最佳对角线预处理
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Wenzhi Gao;Zhaonan Qu;Madeleine Udell;Yinyu Ye - 通讯作者:
Yinyu Ye
Missing Value Imputation for Mixed Data Through Gaussian Copula
通过高斯 Copula 混合数据的缺失值插补
- DOI:
- 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Yuxuan Zhao;Madeleine Udell - 通讯作者:
Madeleine Udell
Madeleine Udell的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Madeleine Udell', 18)}}的其他基金
CAREER: accelerating machine learning with low dimensional structure
职业:利用低维结构加速机器学习
- 批准号:
2233762 - 财政年份:2022
- 资助金额:
$ 55万 - 项目类别:
Continuing Grant
相似海外基金
Accelerating pulse breeding using machine learning
利用机器学习加速豆类育种
- 批准号:
LP230100351 - 财政年份:2024
- 资助金额:
$ 55万 - 项目类别:
Linkage Projects
NSF Convergence Accelerator Track L: Accelerating VOC Sensor Advances and Translation by Machine Learning and Bioinspiration
NSF 融合加速器轨道 L:通过机器学习和生物灵感加速 VOC 传感器的进步和转化
- 批准号:
2344423 - 财政年份:2024
- 资助金额:
$ 55万 - 项目类别:
Standard Grant
Accelerating drug discovery via ML-guided iterative design and optimization
通过机器学习引导的迭代设计和优化加速药物发现
- 批准号:
10552325 - 财政年份:2023
- 资助金额:
$ 55万 - 项目类别:
Collaborative Research: SaTC: CORE: Medium: Accelerating Privacy-Preserving Machine Learning as a Service: From Algorithm to Hardware
协作研究:SaTC:核心:中:加速保护隐私的机器学习即服务:从算法到硬件
- 批准号:
2247893 - 财政年份:2023
- 资助金额:
$ 55万 - 项目类别:
Continuing Grant
CAREER: Combining Machine Learning and Physics-based Modeling Approaches for Accelerating Scientific Discovery
职业:结合机器学习和基于物理的建模方法来加速科学发现
- 批准号:
2239175 - 财政年份:2023
- 资助金额:
$ 55万 - 项目类别:
Continuing Grant
Collaborative Research: SaTC: CORE: Medium: Accelerating Privacy-Preserving Machine Learning as a Service: From Algorithm to Hardware
协作研究:SaTC:核心:中:加速保护隐私的机器学习即服务:从算法到硬件
- 批准号:
2247891 - 财政年份:2023
- 资助金额:
$ 55万 - 项目类别:
Continuing Grant
SBIR Phase II: Accelerating R&D through Streamlined Machine Learning Algorithms for Small Data Applications in Advanced Manufacturing
SBIR 第二阶段:加速 R
- 批准号:
2325045 - 财政年份:2023
- 资助金额:
$ 55万 - 项目类别:
Cooperative Agreement
Machine-learning generated nucleases for accelerating the deployment of a novel, low-emission food production systems
机器学习生成核酸酶,用于加速新型低排放食品生产系统的部署
- 批准号:
10072768 - 财政年份:2023
- 资助金额:
$ 55万 - 项目类别:
Grant for R&D
Collaborative Research: SaTC: CORE: Medium: Accelerating Privacy-Preserving Machine Learning as a Service: From Algorithm to Hardware
协作研究:SaTC:核心:中:加速保护隐私的机器学习即服务:从算法到硬件
- 批准号:
2247892 - 财政年份:2023
- 资助金额:
$ 55万 - 项目类别:
Continuing Grant
AI-ADRD: Accelerating interventions of AD/ADRD via Machine learning methods
AI-ADRD:通过机器学习方法加速 AD/ADRD 干预
- 批准号:
10682237 - 财政年份:2023
- 资助金额:
$ 55万 - 项目类别: