权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

BIGDATA: F: Multiaffine Constrained Optimization for High-Dimensional Big Data Models

BIGDATA：F：高维大数据模型的多仿射约束优化

基本信息

批准号：
1838061
负责人：
Donald Goldfarb
金额：
$ 70万
依托单位：
Columbia University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2018
资助国家：
美国
起止时间：
2018-10-01 至 2022-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1838061&HistoricalAwards=false
关键词：
BIGDATA Multiaffine Constrained Optimization Dimensional

项目摘要

This project addresses fundamental questions about properties of optimization models involving multiaffine functions and algorithms for solving them. Multiaffine functions are functions of variables, or blocks of variables, that are linear in them when all other variables, or blocks of variables, are held fixed. Optimization problems of this type arise in a wide variety of big data applications in science, engineering, medicine, statistics and social media, including machine learning, computer vision, medical and hyperspectral imaging, and tensor models to name just a few. Because these problems often involve massive amounts of data and huge numbers of variables, this project will attempt to develop efficient distributed and probabilistic approaches, enabling solutions to be obtained more rapidly than is currently possible. It is expected that the methodology that is developed will have a pervasive influence on practice in the interdisciplinary field of data science. The project will demonstrate this methodology on data sets from specific applications from a wide range of fields and disseminate the results through websites, code release and conference talks and tutorials, as well as through interactions with faculty and students from various applied disciplines in Columbia University's Data Science Institute, female students through the Society of Women Engineers at Columbia, and the hosting of high school students from under-represented minorities through Columbia University's Young Scholars Program.While providing important tools for solving real world problems, the project is also expected to have a major impact on the theoretical underpinnings of the Alternating Direction Method of Multipliers (ADMM) and an understanding of the optimization landscape of multiaffine problems arising in data analysis. ADMM has become a major algorithmic approach for solving problems in both parallel and distributed computational settings, because of its ability to transform the computationally intensive process of solving a difficult problem into an iterative procedure that involves solving simpler problems that are coupled by a system of linear equations. The project will expand ADMMs applicability by enabling these problems to be coupled by multiaffine constraints. The project will combine this multiaffine ADMM (M-ADMM) approach with stochastic and/or distributed approaches that are provably efficient and scalable. For stochastic M-ADMM methods, how to reduce variance and importance sampling will be studied. For distributed settings, how to incorporate both centralized and decentralized concensus constraints into an M-ADMM framework will be investigated, as will asynchronous variants. The project will empirically study how to distribute the the computational effort of M-ADMM, both according to blocks of data and groups of parameters in high dimensional models. Because multiaffine problems are highly nonconvex, the solutions obtained by M-ADMM are in general only guaranteed to be local optima. It is known however, that for certain multiaffine optimization problems, every local minimum is a global minimum and every saddle point has a direction of strict negative curvature under reasonable assumptions. The project will attempt to expand these kinds of results to more general multiaffine constrained problems and study the ability of M-ADMM for avoiding stagnating near saddle points.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

本项目解决了涉及多仿射函数和求解算法的优化模型属性的基本问题。多仿射函数是变量或变量块的函数，当所有其他变量或变量块保持固定时，它们是线性的。这种类型的优化问题出现在科学、工程、医学、统计学和社交媒体的各种大数据应用中，包括机器学习、计算机视觉、医学和高光谱成像以及张量模型等。由于这些问题通常涉及大量数据和大量变量，因此本项目将尝试开发有效的分布式和概率方法，使解决方案能够比目前更快地获得。预计所开发的方法将对数据科学跨学科领域的实践产生广泛的影响。该项目将在来自广泛领域的特定应用的数据集上展示这种方法，并通过网站、代码发布、会议演讲和教程，以及与哥伦比亚大学数据科学研究所各应用学科的教师和学生、哥伦比亚大学女工程师协会的女学生、并通过哥伦比亚大学的青年学者项目招收少数族裔高中生。在为解决现实世界问题提供重要工具的同时，该项目也有望对乘数交替方向法（ADMM）的理论基础产生重大影响，并对数据分析中出现的多仿射问题的优化前景产生理解。ADMM已经成为解决并行和分布式计算设置问题的主要算法方法，因为它能够将解决难题的计算密集型过程转化为解决由线性方程组耦合的更简单问题的迭代过程。该项目将通过使这些问题与多仿射约束耦合来扩展admm的适用性。该项目将这种多仿射ADMM （M-ADMM）方法与随机和/或分布式方法相结合，这些方法被证明是有效和可扩展的。对于随机M-ADMM方法，将研究如何减小方差和重要抽样。对于分布式设置，将研究如何将集中和分散的共识约束合并到M-ADMM框架中，以及异步变体。该项目将实证研究如何根据高维模型中的数据块和参数组分配M-ADMM的计算工作量。由于多仿射问题是高度非凸的，因此M-ADMM算法一般只能保证解是局部最优的。然而，我们知道，对于某些多仿射优化问题，在合理的假设下，每个局部最小值都是全局最小值，每个鞍点都有一个严格负曲率的方向。该项目将尝试将这些结果扩展到更一般的多仿射约束问题，并研究M-ADMM避免鞍点附近停滞的能力。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。