BIGDATA:F: Statistical and Computational Optimal Transport for Geometric Data Analysis
BIGDATA:F:几何数据分析的统计和计算最佳传输
基本信息
- 批准号:1838071
- 负责人:
- 金额:$ 100万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-12-01 至 2023-11-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Current approaches to big data and accompanying computational methods have left behind critical applications where the data is not a collection of individual points, but rather whole geometric objects. Such applications include medical imaging, LiDAR for self-driving cars, and single-cell RNA sequencing, to name a few. Transferring the overwhelming success of simpler data processing and statistical techniques to this regime requires not only large datasets, but also suitable models and algorithms for analysis of this more general type of data. The theory of optimal transport has proven valuable to address these limitations thanks to recent advances on the computational front. Yet, understanding optimal transport as a statistical tool is still in its infancy. This project aims at developing a "geometric data analysis" toolbox based on optimal transport to tackle these new datasets. This proposal will help create a common language to interact and collaborate across disciplines. Much of this research will be integrated in this curriculum and made available through MIT OpenCourseWare. This proposal will also enable rich interdisciplinary training of PhD and undergraduate students.The proposed methods are built around the rich mathematical theory of optimal transport (OT). This theory provides a framework for the development of new methods for geometric data analysis in addition to their rigorous statistical and computational analysis. The nascent theory of computational optimal transport is still largely dissociated from statistics, and many methods do not account properly for sampling and measurement noise. To avoid the pitfalls of overfitting, this proposal singularly and systematically takes a statistical approach to geometric data analysis. With an understanding of the theoretical advantages and drawbacks of OT for statistical modeling, it will lead to scalable OT algorithms with strong statistical guarantees. A tangible outcome of this proposal is a cohesive toolbox extending not only averaging but also regression, classification, clustering, and other notions from classical statistics in a fashion that captures global geometric features of data. It will have a direct impact on various applications in analysis of not only medical images but also point clouds gathered by LiDAR for self-driving cars, sequences of gene expressions produced by single-cell RNA sequencing, and other diverse yet large-scale sources of data. These datasets contain millions of entities but resist application of standard statistical procedures; current state-of-the-art techniques for their analysis are ad-hoc, not generalizable, and fail to reach the quality achieved by "big data" tools in other domains. Educational impact will be made by incorporating this work in new degree programs in statistics at MIT.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
目前的大数据方法和相应的计算方法已经落后于关键应用程序,其中数据不是单个点的集合,而是整个几何对象。此类应用包括医学成像、自动驾驶汽车的激光雷达和单细胞RNA测序等。将简单的数据处理和统计技术的巨大成功转移到这一制度不仅需要大型数据集,而且还需要用于分析这种更一般类型的数据的合适模型和算法。最佳运输理论已被证明是有价值的,以解决这些限制,由于最近的进展,在计算方面。然而,将最佳运输理解为一种统计工具仍处于起步阶段。该项目旨在开发一个基于最佳运输的“几何数据分析”工具箱,以处理这些新的数据集。该提案将有助于创建跨学科互动和协作的共同语言。这些研究的大部分内容将被整合到本课程中,并通过麻省理工学院的开放式课程提供。该建议还将使博士和本科生的丰富的跨学科培训。所提出的方法是建立在丰富的数学理论的最佳运输(OT)。除了严格的统计和计算分析之外,该理论还为开发几何数据分析的新方法提供了框架。计算最优传输的新生理论在很大程度上仍然与统计无关,许多方法不能正确考虑采样和测量噪声。为了避免过度拟合的陷阱,该建议单独和系统地采用统计方法进行几何数据分析。了解OT在统计建模中的理论优势和不足,将有助于开发具有强统计保证的可扩展OT算法。这一提议的一个切实成果是一个有凝聚力的工具箱,它不仅扩展了平均值,而且还扩展了回归、分类、聚类和经典统计学中的其他概念,其方式是捕捉数据的全局几何特征。 它将对各种应用产生直接影响,不仅包括医学图像分析,还包括用于自动驾驶汽车的LiDAR收集的点云,单细胞RNA测序产生的基因表达序列以及其他多样化但大规模的数据来源。 这些数据集包含数百万个实体,但无法应用标准统计程序;目前最先进的分析技术是临时的,不可推广,无法达到其他领域的“大数据”工具所达到的质量。 通过将这项工作纳入麻省理工学院的统计学新学位课程,将产生教育影响。该奖项反映了NSF的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(67)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
PRNet: Self-Supervised Learning for Partial-to-Partial Registration
- DOI:
- 发表时间:2019-10
- 期刊:
- 影响因子:0
- 作者:Yue Wang;J. Solomon
- 通讯作者:Yue Wang;J. Solomon
Entropic optimal transport is maximum-likelihood deconvolution
- DOI:10.1016/j.crma.2018.10.010
- 发表时间:2018-09
- 期刊:
- 影响因子:0.8
- 作者:P. Rigollet;J. Weed
- 通讯作者:P. Rigollet;J. Weed
Gaussian discrepancy: A probabilistic relaxation of vector balancing
高斯差异:矢量平衡的概率松弛
- DOI:10.1016/j.dam.2022.08.007
- 发表时间:2022
- 期刊:
- 影响因子:1.1
- 作者:Chewi, Sinho;Gerber, Patrik;Rigollet, Philippe;Turner, Paxton
- 通讯作者:Turner, Paxton
Algebraic Representations for Volumetric Frame Fields
- DOI:10.1145/3366786
- 发表时间:2019-08
- 期刊:
- 影响因子:0
- 作者:David R Palmer;D. Bommes;J. Solomon
- 通讯作者:David R Palmer;D. Bommes;J. Solomon
Model Fusion with Kullback-Leibler Divergence
- DOI:
- 发表时间:2020-07
- 期刊:
- 影响因子:0
- 作者:Sebastian Claici;M. Yurochkin;S. Ghosh;J. Solomon
- 通讯作者:Sebastian Claici;M. Yurochkin;S. Ghosh;J. Solomon
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Justin Solomon其他文献
Preoperative planning for shoulder arthroplasty is feasible with computed tomography at lower-than-conventional radiation doses
以低于常规的辐射剂量进行计算机断层扫描来为肩关节置换术进行术前规划是可行的。
- DOI:
10.1016/j.jse.2024.08.038 - 发表时间:
2025-05-01 - 期刊:
- 影响因子:2.900
- 作者:
Kaitlyn Rodriguez;Jay Levin;Justin Solomon;Eoghan T. Hurley;Daniel Lorenzana;Ehsan Samei;Yaw Boachie-Adjie;Robert French;Oke Anakwenze;Christopher Klifto - 通讯作者:
Christopher Klifto
Lifting Directional Fields to Minimal Sections
将方向场提升到最小截面
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
David Palmer;Albert Chern;Justin Solomon - 通讯作者:
Justin Solomon
Co-Optimization of Design and Fabrication Plans for Carpentry: Supplemental Material
木工设计和制造计划的协同优化:补充材料
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Haisen Zhao;Max Willsey;Amy Zhu;Chandrakana Nandi;Zach Tatlock;Justin Solomon;Adriana Schulz - 通讯作者:
Adriana Schulz
Justin Solomon的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Justin Solomon', 18)}}的其他基金
Conference: Summer Geometry Initiative 2024
会议:2024 年夏季几何倡议
- 批准号:
2419933 - 财政年份:2024
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
Geometry Processing Summer Institute 2021
几何处理暑期学院 2021
- 批准号:
2103933 - 财政年份:2021
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
相似海外基金
STATISTICAL AND COMPUTATIONAL THRESHOLDS IN SPIN GLASSES AND GRAPH INFERENCE PROBLEMS
自旋玻璃和图推理问题的统计和计算阈值
- 批准号:
2347177 - 财政年份:2024
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
Collaborative Research: The computational and neural basis of statistical learning during musical enculturation
合作研究:音乐文化过程中统计学习的计算和神经基础
- 批准号:
2242084 - 财政年份:2023
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
Collaborative Research: The computational and neural basis of statistical learning during musical enculturation
合作研究:音乐文化过程中统计学习的计算和神经基础
- 批准号:
2242085 - 财政年份:2023
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
Conference: Advances in Statistical and Computational Methods for Analysis of Biomedical, Genetic, and Omics Data
会议:生物医学、遗传和组学数据分析的统计和计算方法的进展
- 批准号:
2232547 - 财政年份:2023
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
New statistical and computational tools for optimization of planarian behavioral chemical screens
用于优化涡虫行为化学筛选的新统计和计算工具
- 批准号:
10658688 - 财政年份:2023
- 资助金额:
$ 100万 - 项目类别:
CORE 1/2: INIA Stress and Chronic Alcohol Interactions: Computational and Statistical Analysis Core (CSAC)
CORE 1/2:INIA 压力和慢性酒精相互作用:计算和统计分析核心 (CSAC)
- 批准号:
10411629 - 财政年份:2022
- 资助金额:
$ 100万 - 项目类别:
Statistical Methods and Computational Tools for Marine Animal Movement, Distribution and Population Size
海洋动物运动、分布和种群规模的统计方法和计算工具
- 批准号:
RGPIN-2019-05688 - 财政年份:2022
- 资助金额:
$ 100万 - 项目类别:
Discovery Grants Program - Individual
Developing computational, statistical and machine learning methods to uncover biological mechanisms of complex phenotypes
开发计算、统计和机器学习方法来揭示复杂表型的生物学机制
- 批准号:
RGPIN-2021-04062 - 财政年份:2022
- 资助金额:
$ 100万 - 项目类别:
Discovery Grants Program - Individual
Statistical and Computational Tools for Analyzing High-Dimensional Heterogeneous Data
用于分析高维异构数据的统计和计算工具
- 批准号:
2210907 - 财政年份:2022
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
Bridging Statistical Hypothesis Tests and Deep Learning for Reliability and Computational Efficiency
连接统计假设检验和深度学习以提高可靠性和计算效率
- 批准号:
2134037 - 财政年份:2022
- 资助金额:
$ 100万 - 项目类别:
Continuing Grant