权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: accelerating machine learning with low dimensional structure

职业：利用低维结构加速机器学习

基本信息

批准号：
1943131
负责人：
Madeleine Udell
金额：
$ 55万
依托单位：
Cornell University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2020
资助国家：
美国
起止时间：
2020-10-01 至 2022-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1943131&HistoricalAwards=false
关键词：
CAREER accelerating machine learning low

项目摘要

Big datasets are everywhere: in science, in health, in commerce, and in government, data is becoming easier and cheaper to collect. Yet extracting value from this data is a challenge; every step requires human intervention: cleaning the data, identifying useful features, and choosing a machine learning model. The goal of this project is to develop new methods to accelerate and automate the basic machine learning (ML) workflow. Automation frees data scientists from data cleaning and parameter twiddling to concentrate on the important questions: are we solving the right problems, and do we have the right data? This project will help democratize machine learning and promote data-driven decision making by developing automated methods to clean data and to choose ML models, including open source software packages, that make these methods widely available and easy to use. The project also advances these goals by training data scientists in how to use these models and understand their potential risks. Low dimensional structure provides the key to meeting the diverse challenges required to automate machine learning. This project relies on the central insight is that measurements of a complex object, such as a patient in a hospital, respondent on a survey, or even a ML dataset, can be well described as simple functions (or even linear functions) of an underlying low dimensional latent vector. The project develops new algorithms and software to identify low dimensional latent vectors and to use them to a) clean the data by denoising observations or imputing missing entries, b) reduce the dimensionality of feature vectors, and c) recommend better algorithms. This project will develop new techniques to identify low dimensional latent vectors from sparse observations via nonlinear (even, discontinuous) functions, with efficient algorithms and with theoretical guarantees. To enable more efficient automated machine, the project will develop methods localize similar datasets near each other in a low dimensional space, so that nearness in this space predicts similar performance of machine learning methods.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

大数据集无处不在：在科学、健康、商业和政府领域，收集数据变得越来越容易、越来越便宜。然而，从这些数据中提取价值却是一项挑战;每一步都需要人为干预：清理数据、识别有用的特征以及选择机器学习模型。该项目的目标是开发新方法来加速和自动化基本的机器学习（ML）工作流程。自动化将数据科学家从数据清理和参数调整中解放出来，专注于重要的问题：我们是否正在解决正确的问题，以及我们是否拥有正确的数据？该项目将通过开发自动化方法来清理数据和选择ML模型，包括开源软件包，使这些方法广泛可用且易于使用，从而帮助实现机器学习的民主化，并促进数据驱动的决策。该项目还通过培训数据科学家如何使用这些模型并了解其潜在风险来推进这些目标。低维结构为应对机器学习自动化所需的各种挑战提供了关键。这个项目依赖于一个核心的观点，即复杂对象的测量，比如医院里的病人，调查中的受访者，甚至是ML数据集，都可以很好地描述为底层低维潜在向量的简单函数（甚至是线性函数）。该项目开发了新的算法和软件来识别低维潜在向量，并使用它们来a）通过对观察结果进行降噪或估算缺失条目来清理数据，B）降低特征向量的维度，以及c）推荐更好的算法。该项目将开发新的技术，通过非线性（偶，不连续）函数从稀疏观测中识别低维潜在向量，并提供有效的算法和理论保证。为了实现更高效的自动化机器，该项目将开发在低维空间中定位彼此接近的相似数据集的方法，以便在该空间中的接近度预测机器学习方法的相似性能。该奖项反映了NSF的法定使命，并通过使用基金会的智力价值和更广泛的影响审查标准进行评估而被认为值得支持。