Efficient Distribution Classification Tasks via Optimal Transport Embeddings

通过最优传输嵌入实现高效的分布分类任务

基本信息

  • 批准号:
    2111322
  • 负责人:
  • 金额:
    $ 12.74万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-07-01 至 2023-01-31
  • 项目状态:
    已结题

项目摘要

Classification allows one to organize data based on similarities and can provide insight into underlying relationships in a large variety of fields, including cancer research, survey analysis, and image and text processing. As a result, the development of efficient algorithms for classification tasks is an important research area. One approach, machine learning, has proved successful in classification tasks, but it is usually focused on data points in vector spaces. In many applications, however, instances of data are naturally interpreted as entire point clouds, or as distributions, and do not lie in a vector space. Furthermore, the high dimension of such datasets leads to theoretical and computational challenges. This project is devoted to the development of classification algorithms for high-dimensional datasets consisting of distributions, and will focus both on their theoretical analysis and computational efficiency. To this end, the principal investigator will use the framework of optimal transport, which provides a natural way of comparing distributions. Students will be involved and trained in interdisciplinary aspects of this project.This project applies knowledge from computational optimal transport, such as linear embeddings and regularized optimization, and machine learning algorithms, to study classification tasks for datasets consisting of distributions. The main goal is to develop approximation methods with guaranteed error bounds that also allow for algorithmic insights and efficient implementation. Open problems on approximation power, computational feasibility, and numerical analysis will be addressed. Specifically, the project addresses four fundamental questions that arise in the field: (1) What are the types of distributions that can be classified with traditional machine learning techniques through linear embeddings, and how does the choice of a regularizer affect accuracy? (2) How well can the Wasserstein distance and Wasserstein barycenters be approximated through linear embeddings using Euclidean distances? (3) Under which conditions can we guarantee separability with simple classifiers in the embedding space for disjoint classes of distributions? (4) How can we tailor our framework to address various applications, such as classifying structures in audio or video segments?This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
分类允许人们根据相似性组织数据,并可以深入了解各种领域的潜在关系,包括癌症研究,调查分析以及图像和文本处理。因此,开发有效的分类算法是一个重要的研究领域。一种方法,机器学习,已被证明在分类任务中是成功的,但它通常集中在向量空间中的数据点。然而,在许多应用程序中,数据的实例被自然地解释为整个点云或分布,并且不位于向量空间中。此外,这种数据集的高维性导致了理论和计算上的挑战。本项目致力于开发由分布组成的高维数据集的分类算法,并将专注于其理论分析和计算效率。为此,首席研究员将使用最优运输的框架,它提供了一种比较分布的自然方法。学生将参与该项目的跨学科方面的培训。该项目应用计算最优传输的知识,如线性嵌入和正则化优化,以及机器学习算法,研究由分布组成的数据集的分类任务。主要目标是开发具有保证误差范围的近似方法,同时允许算法见解和有效实现。近似能力,计算可行性和数值分析的开放问题将得到解决。具体来说,该项目解决了该领域出现的四个基本问题:(1)通过线性嵌入,可以用传统机器学习技术分类的分布类型是什么,以及正则化器的选择如何影响准确性?(2)通过使用欧几里德距离的线性嵌入,Wasserstein距离和Wasserstein重心可以近似到什么程度?(3)在什么条件下,我们可以保证分离与简单的分类器在嵌入空间的不相交类的分布?(4)我们如何定制我们的框架来解决各种应用,例如对音频或视频片段中的结构进行分类?该奖项反映了NSF的法定使命,并被认为是值得通过使用基金会的知识价值和更广泛的影响审查标准进行评估的支持。

项目成果

期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Linear optimal transport embedding: provable Wasserstein classification for certain rigid transformations and perturbations
线性最优传输嵌入:针对某些刚性变换和扰动的可证明 Wasserstein 分类
Supervised learning of sheared distributions using linearized optimal transport
  • DOI:
    10.1007/s43670-022-00038-2
  • 发表时间:
    2022-01
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Varun Khurana;Harish Kannan;A. Cloninger;Caroline Moosmüller
  • 通讯作者:
    Varun Khurana;Harish Kannan;A. Cloninger;Caroline Moosmüller
Hermite B-Splines: n-Refinability and Mask Factorization
Hermite B 样条:n-可细化性和掩模因子分解
  • DOI:
    10.3390/math9192458
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    2.4
  • 作者:
    Cotronei, Mariantonia;Moosmüller, Caroline
  • 通讯作者:
    Moosmüller, Caroline
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Caroline Moosmueller其他文献

Periodicity Scoring of Time Series Encodes Dynamical Behavior of the Tumor Suppressor p53
时间序列的周期性评分编码肿瘤抑制因子的动态行为 p53
  • DOI:
    10.1101/2020.02.04.933192
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Caroline Moosmueller;Christopher J. Tralie;M. Kooshkbaghi;Z. Belkhatir;Maryam Pouryahya;J. Reyes;J. Deasy;A. Tannenbaum;I. Kevrekidis
  • 通讯作者:
    I. Kevrekidis

Caroline Moosmueller的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Caroline Moosmueller', 18)}}的其他基金

Efficient Distribution Classification Tasks via Optimal Transport Embeddings
通过最优传输嵌入实现高效的分布分类任务
  • 批准号:
    2306064
  • 财政年份:
    2022
  • 资助金额:
    $ 12.74万
  • 项目类别:
    Continuing Grant

相似海外基金

Efficient Distribution Classification Tasks via Optimal Transport Embeddings
通过最优传输嵌入实现高效的分布分类任务
  • 批准号:
    2306064
  • 财政年份:
    2022
  • 资助金额:
    $ 12.74万
  • 项目类别:
    Continuing Grant
Distribution of marine diatom small Chaetoceros spp. in coastal area in Japan and the classification using the technique of molecular biology
海洋硅藻小角毛藻的分布。
  • 批准号:
    17K07888
  • 财政年份:
    2017
  • 资助金额:
    $ 12.74万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Utility of Ducks Unlimited's Wetland Classification for predicting amphibian distribution and abundance with an emphasis on Canadian toads
Ducks Unlimited 的湿地分类在预测两栖动物分布和丰度(重点是加拿大蟾蜍)方面的实用性
  • 批准号:
    460839-2013
  • 财政年份:
    2015
  • 资助金额:
    $ 12.74万
  • 项目类别:
    Industrial Postgraduate Scholarships
Utility of Ducks Unlimited's Wetland Classification for predicting amphibian distribution and abundance with an emphasis on Canadian toads
Ducks Unlimited 的湿地分类在预测两栖动物分布和丰度(重点是加拿大蟾蜍)方面的实用性
  • 批准号:
    460839-2013
  • 财政年份:
    2014
  • 资助金额:
    $ 12.74万
  • 项目类别:
    Industrial Postgraduate Scholarships
Anomaly Detection, Classification, and Distribution in Next Generation Radar-Based Monitoring and Control Systems
下一代雷达监控系统中的异常检测、分类和分布
  • 批准号:
    398768-2010
  • 财政年份:
    2013
  • 资助金额:
    $ 12.74万
  • 项目类别:
    Collaborative Research and Development Grants
Utility of Ducks Unlimited's Wetland Classification for predicting amphibian distribution and abundance with an emphasis on Canadian toads
Ducks Unlimited 的湿地分类在预测两栖动物分布和丰度(重点是加拿大蟾蜍)方面的实用性
  • 批准号:
    460839-2013
  • 财政年份:
    2013
  • 资助金额:
    $ 12.74万
  • 项目类别:
    Industrial Postgraduate Scholarships
CIF: Small: Distribution-Adaptive Prediction and Classification
CIF:小型:分布自适应预测和分类
  • 批准号:
    1217880
  • 财政年份:
    2012
  • 资助金额:
    $ 12.74万
  • 项目类别:
    Standard Grant
Anomaly Detection, Classification, and Distribution in Next Generation Radar-Based Monitoring and Control Systems
下一代雷达监控系统中的异常检测、分类和分布
  • 批准号:
    398768-2010
  • 财政年份:
    2012
  • 资助金额:
    $ 12.74万
  • 项目类别:
    Collaborative Research and Development Grants
Anomaly Detection, Classification, and Distribution in Next Generation Radar-Based Monitoring and Control Systems
下一代雷达监控系统中的异常检测、分类和分布
  • 批准号:
    398768-2010
  • 财政年份:
    2011
  • 资助金额:
    $ 12.74万
  • 项目类别:
    Collaborative Research and Development Grants
Automatic classification of sidescan sonar image for mapping aquatic macrophyte distribution
侧扫声纳图像自动分类用于绘制水生大型植物分布图
  • 批准号:
    23710039
  • 财政年份:
    2011
  • 资助金额:
    $ 12.74万
  • 项目类别:
    Grant-in-Aid for Young Scientists (B)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了