Optimization Techniques for Geometrizing Real-World Data

现实世界数据几何化的优化技术

基本信息

  • 批准号:
    1913134
  • 负责人:
  • 金额:
    $ 5.06万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2019
  • 资助国家:
    美国
  • 起止时间:
    2019-09-01 至 2020-09-30
  • 项目状态:
    已结题

项目摘要

Data is a common denominator to scientific fields, governments, and private enterprises. Being able to exploit data to find patterns has produced scientific breakthroughs and shifted business paradigms in the last several decades. This project focuses on mathematical and algorithmic techniques for specific data science problems, tailored to currently relevant domain problems, technologies, and volumes of data. The theoretical problems we consider are (i) clustering (which essentially consists on grouping data according to similarity in an unsupervised way), (ii) dimensionality reduction (reducing the volume of the data while preserving its relevant features), and (iii) quadratic assignment (finding correspondences between different datasets). The main underlying application we consider in this project is computational biology, in particular the processing of single-cell sequencing data. The technology for single-cell sequencing has been very recently developed and it is improving quickly, producing new datasets, problems and challenges that are interesting from a mathematical point of view and have potentially enormous impact. The project will have mathematicians working closely to computational biologists with the goal of identifying data science problems occurring in the scientific domain and to develop appropriate algorithms and mathematical tools.Given single-cell genetic expression data indicating how many times each gene is expressed in each cell, one objective is to select a few genes that can be used to identify different classes of cells. This problem is known in the computational biology literature as genetic marker selection. In a first approach we assume the class of each cell is known and the problem can be posed as supervised dimensionality reduction. We model it as a projection factor recovery problem, and we approach it using optimization tools such as semidefinite and linear programming. The objective is two-fold, we aim to study mathematical properties of the model we devise, and to develop an efficient tool to be used by practitioners. A second stage of the project is to make the problem unsupervised, therefore clustering will be a fundamental step. We will study stability properties of clustering methods and we will provide an efficient algorithm to evaluate the quality of clusters, based on statistical and optimization techniques. The potential use of this tool is general to data science and not just gene expression datasets. Finally, a third objective is to align datasets coming from different experiments. This problem is ubiquitous in data science, with graph matching and shape matching as some particular cases. In the context of computational biology the alignment problem is known as batch correction and it can be modeled with optimal transport or as a quadratic assignment problem. We will develop alignment algorithms and study their convergence and recovery properties under different data models.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
数据是科学领域、政府和私营企业的共同点。在过去的几十年里,能够利用数据来发现模式已经产生了科学突破,并改变了商业模式。该项目侧重于针对特定数据科学问题的数学和算法技术,针对当前相关领域的问题,技术和数据量进行定制。我们考虑的理论问题是(i)聚类(基本上包括以无监督的方式根据相似性对数据进行分组),(ii)降维(减少数据量,同时保留其相关特征),以及(iii)二次分配(找到不同数据集之间的对应关系)。我们在这个项目中考虑的主要基础应用是计算生物学,特别是单细胞测序数据的处理。单细胞测序技术是最近开发的,它正在快速改进,产生新的数据集,问题和挑战,从数学的角度来看是有趣的,并具有潜在的巨大影响。该项目将让数学家与计算生物学家密切合作,目标是识别科学领域中发生的数据科学问题,并开发适当的算法和数学工具。给定单细胞基因表达数据,表明每个基因在每个细胞中表达多少次,一个目标是选择一些可以用于识别不同类别细胞的基因。这个问题在计算生物学文献中被称为遗传标记选择。在第一种方法中,我们假设每个细胞的类是已知的,并且问题可以被视为监督降维。我们将其建模为投影因子恢复问题,并使用半定和线性规划等优化工具来处理它。我们的目标是双重的,我们的目标是研究我们设计的模型的数学特性,并开发一个有效的工具,供从业人员使用。该项目的第二阶段是使问题不受监督,因此聚类将是一个基本步骤。我们将研究聚类方法的稳定性,我们将提供一个有效的算法来评估集群的质量,基于统计和优化技术。该工具的潜在用途是数据科学的通用工具,而不仅仅是基因表达数据集。最后,第三个目标是对齐来自不同实验的数据集。这个问题在数据科学中普遍存在,图匹配和形状匹配是一些特殊情况。在计算生物学的背景下,对齐问题被称为批量校正,它可以用最优运输或二次分配问题来建模。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Soledad Villar其他文献

Manifold optimization for k-means clustering
k 均值聚类的流形优化
A polynomial-time relaxation of the Gromov-Hausdorff distance
Gromov-Hausdorff 距离的多项式时间松弛
  • DOI:
  • 发表时间:
    2016
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Soledad Villar;A. Bandeira;A. Blumberg;Rachel A. Ward
  • 通讯作者:
    Rachel A. Ward
MarkerMap: nonlinear marker selection for single-cell studies
MarkerMap:单细胞研究的非线性标记选择
  • DOI:
    10.1038/s41540-024-00339-3
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    4
  • 作者:
    Nabeel Sarwar;Wilson Gregory;George A. Kevrekidis;Soledad Villar;Bianca Dumitrascu
  • 通讯作者:
    Bianca Dumitrascu
Shuffled linear regression through graduated convex relaxation
通过分级凸松弛进行洗牌线性回归
  • DOI:
    10.48550/arxiv.2209.15608
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Efe Onaran;Soledad Villar
  • 通讯作者:
    Soledad Villar
Three proofs of the Benedetto-Fickus theorem
Benedetto-Fickus 定理的三个证明
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    D. Mixon;Tom Needham;C. Shonkwiler;Soledad Villar
  • 通讯作者:
    Soledad Villar

Soledad Villar的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Soledad Villar', 18)}}的其他基金

CAREER: Symmetries and Classical Physics in Machine Learning for Science and Engineering
职业:科学与工程机器学习中的对称性和经典物理学
  • 批准号:
    2339682
  • 财政年份:
    2024
  • 资助金额:
    $ 5.06万
  • 项目类别:
    Continuing Grant
Collaborative Research: CIF: Medium: Understanding Robustness via Parsimonious Structures.
合作研究:CIF:中:通过简约结构了解鲁棒性。
  • 批准号:
    2212457
  • 财政年份:
    2022
  • 资助金额:
    $ 5.06万
  • 项目类别:
    Standard Grant
Optimization Techniques for Geometrizing Real-World Data
现实世界数据几何化的优化技术
  • 批准号:
    2044349
  • 财政年份:
    2020
  • 资助金额:
    $ 5.06万
  • 项目类别:
    Standard Grant

相似国自然基金

EstimatingLarge Demand Systems with MachineLearning Techniques
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    外国学者研究基金

相似海外基金

Postdoctoral Fellowship: OPP-PRF: Leveraging Community Structure Data and Machine Learning Techniques to Improve Microbial Functional Diversity in an Arctic Ocean Ecosystem Model
博士后奖学金:OPP-PRF:利用群落结构数据和机器学习技术改善北冰洋生态系统模型中的微生物功能多样性
  • 批准号:
    2317681
  • 财政年份:
    2024
  • 资助金额:
    $ 5.06万
  • 项目类别:
    Standard Grant
RII Track-4:NSF: Design of zeolite-encapsulated metal phthalocyanines catalysts enabled by insights from synchrotron-based X-ray techniques
RII Track-4:NSF:通过基于同步加速器的 X 射线技术的见解实现沸石封装金属酞菁催化剂的设计
  • 批准号:
    2327267
  • 财政年份:
    2024
  • 资助金额:
    $ 5.06万
  • 项目类别:
    Standard Grant
CAREER: Data-Driven Hardware and Software Techniques to Enable Sustainable Data Center Services
职业:数据驱动的硬件和软件技术,以实现可持续的数据中心服务
  • 批准号:
    2340042
  • 财政年份:
    2024
  • 资助金额:
    $ 5.06万
  • 项目类别:
    Continuing Grant
Creating a reflective, assessment workbook for University teachers to enhance teaching techniques and improve student engagement, by incorporating International Baccalaureate (IB) teaching practices
通过纳入国际文凭 (IB) 教学实践,为大学教师创建反思性评估工作簿,以提高教学技巧并提高学生参与度
  • 批准号:
    24K06129
  • 财政年份:
    2024
  • 资助金额:
    $ 5.06万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Developing Advanced Cryptanalysis Techniques for Symmetric-key Primitives with Real-world Public-key Applications
使用现实世界的公钥应用开发对称密钥原语的高级密码分析技术
  • 批准号:
    24K20733
  • 财政年份:
    2024
  • 资助金额:
    $ 5.06万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Development of new molecular self-temperature sensing techniques using luminescence-absorption hybrid thermometry
利用发光-吸收混合测温法开发新型分子自温度传感技术
  • 批准号:
    24K17691
  • 财政年份:
    2024
  • 资助金额:
    $ 5.06万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Novel techniques of percutaneous sonography-guided surgical operations (SonoSurgery
经皮超声引导外科手术新技术(SonoSurgery
  • 批准号:
    10087309
  • 财政年份:
    2024
  • 资助金额:
    $ 5.06万
  • 项目类别:
    Collaborative R&D
ConSenT: Connected Sensing Techniques: Cooperative Radar Networks Using Joint Radar and Communication Waveforms
ConSenT:互联传感技术:使用联合雷达和通信波形的协作雷达网络
  • 批准号:
    EP/Y035933/1
  • 财政年份:
    2024
  • 资助金额:
    $ 5.06万
  • 项目类别:
    Fellowship
ERI: SDR Beyond Radio: Enabling Experimental Research in Multi-Node Optical Wireless Networks via Software Defined Radio Tools and Techniques
ERI:超越无线电的 SDR:通过软件定义无线电工具和技术实现多节点光无线网络的实验研究
  • 批准号:
    2347514
  • 财政年份:
    2024
  • 资助金额:
    $ 5.06万
  • 项目类别:
    Standard Grant
CRII: SHF: Embedding techniques for mechanized reasoning about existing programs
CRII:SHF:现有程序机械化推理的嵌入技术
  • 批准号:
    2348490
  • 财政年份:
    2024
  • 资助金额:
    $ 5.06万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了