Collaborative Research: Data-driven Path Metrics for Machine Learning
协作研究:机器学习的数据驱动路径度量
基本信息
- 批准号:2131292
- 负责人:
- 金额:$ 15万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-01-01 至 2022-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The era of big data has introduced unprecedented computational and mathematical challenges. Traditional machine learning algorithms often lack scalable computational complexity, while modern approaches lack solid mathematical foundations. Moreover, high data dimensionality creates challenges for traditional methods of data analysis. The principal investigators (PIs) propose to combine classic dimension reduction methods with data-driven distances, so that both the distance and embedding procedure are data dependent. This novel approach allows for greater flexibility in balancing the density-based and geometric features of the data, achieves a density-based simplification of geometry, and insightfully represents the data in a small number of dimensions. In contrast to black box methods such as deep learning, the developed methodology can be rigorously analyzed to derive strong theoretical guarantees for several statistical and machine learning tasks. This research will contribute computational tools for cancer immunogenomics and the investigators will consult with the Rogel Cancer Center at the University of Michigan for scientific questions related to tumor immunology and T-cell biology. In addition, new data analysis tools will be made publicly available in an open source software package. The investigators' approach is driven by the analysis of a family of data-dependent path metrics. These metrics are both density-sensitive and geometry-preserving, with the balance governed by the choice of a single parameter p. By utilizing the space of paths through data, the PIs will obtain density based metrics and embeddings while avoiding the explicit computation of a density estimator, which may be unreliable in a large number of dimensions. The PIs will propose a simple yet highly flexible data model which does not assume the data is sampled from a manifold or collection of manifolds, and investigate the continuous limit of these metrics and an associated graph Laplacian operator. By continuously varying the parameter p, the PIs will propose to create data videos which represent the data from multiple perspectives. The PIs will investigate both multidimensional scaling and graph Laplacian embeddings as mechanisms for obtaining path-based low dimensional representations, and will explore fast algorithms with scalable computational complexity for approximating these metrics. The PIs will contextualize path metrics in the larger frame work of data-driven metrics and focus specifically on the analysis of biological data.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
大数据时代带来了前所未有的计算和数学挑战。传统的机器学习算法通常缺乏可扩展的计算复杂度,而现代方法缺乏坚实的数学基础。此外,高数据维度给传统的数据分析方法带来了挑战。主要研究者(PI)建议将联合收割机经典的降维方法与数据驱动的距离相结合,使得距离和嵌入过程都是数据相关的。这种新颖的方法允许更大的灵活性,在平衡数据的密度为基础的几何特征,实现了基于密度的几何简化,并有见地地表示在一个小的维度数的数据。与深度学习等黑箱方法相比,所开发的方法可以进行严格的分析,为几个统计和机器学习任务提供强有力的理论保证。这项研究将为癌症免疫基因组学提供计算工具,研究人员将与密歇根大学Rogel癌症中心就肿瘤免疫学和T细胞生物学相关的科学问题进行咨询。此外,新的数据分析工具将在开源软件包中公开提供。研究人员的方法是由一系列数据依赖路径度量的分析驱动的。这些度量都是密度敏感和几何保持,与平衡由一个单一的参数p的选择。通过利用通过数据的路径的空间,PI将获得密度为基础的度量和嵌入,同时避免显式计算的密度估计,这可能是不可靠的,在大量的维度。PI将提出一个简单但高度灵活的数据模型,该模型不假设数据是从流形或流形集合中采样的,并研究这些度量的连续极限和相关的图形拉普拉斯算子。通过不断改变参数p,PI将建议创建从多个角度表示数据的数据视频。PI将研究多维缩放和图形拉普拉斯嵌入作为获得基于路径的低维表示的机制,并将探索具有可扩展计算复杂性的快速算法来近似这些度量。PI将在数据驱动指标的更大框架中将路径指标置于情境中,并特别关注生物数据的分析。该奖项反映了NSF的法定使命,并被认为值得通过使用基金会的知识价值和更广泛的影响审查标准进行评估来支持。
项目成果
期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Balancing Geometry and Density: Path Distances on High-Dimensional Data
- DOI:10.1137/20m1386657
- 发表时间:2020-12
- 期刊:
- 影响因子:0
- 作者:A. Little;Daniel Mckenzie;James M. Murphy
- 通讯作者:A. Little;Daniel Mckenzie;James M. Murphy
Path-Based Spectral Clustering: Guarantees, Robustness to Outliers, and Fast Algorithms
- DOI:
- 发表时间:2017-12
- 期刊:
- 影响因子:0
- 作者:A. Little;M. Maggioni;James M. Murphy
- 通讯作者:A. Little;M. Maggioni;James M. Murphy
An analysis of classical multidimensional scaling with applications to clustering
- DOI:10.1093/imaiai/iaac004
- 发表时间:2022-04-23
- 期刊:
- 影响因子:1.6
- 作者:Little,Anna;Xie,Yuying;Sun,Qiang
- 通讯作者:Sun,Qiang
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Anna Little其他文献
Worth the Wait? Time Window Feature Optimization for Attack Classification
值得等待?
- DOI:
10.1109/bigdata47090.2019.9006304 - 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
C. Wilson;X. Mountrouidou;Anna Little - 通讯作者:
Anna Little
Spectral Clustering Technique for Classifying Network Attacks
用于对网络攻击进行分类的谱聚类技术
- DOI:
- 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
Anna Little;X. Mountrouidou;Daniel Moseley - 通讯作者:
Daniel Moseley
Anna Little的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Anna Little', 18)}}的其他基金
Moment Invariant Data Aggregation for Signal Processing and Distribution Learning
用于信号处理和分布学习的矩不变数据聚合
- 批准号:
2309570 - 财政年份:2023
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research: Data-driven Path Metrics for Machine Learning
协作研究:机器学习的数据驱动路径度量
- 批准号:
1912906 - 财政年份:2019
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
Cell Research
- 批准号:31224802
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research
- 批准号:31024804
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research (细胞研究)
- 批准号:30824808
- 批准年份:2008
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
- 批准号:10774081
- 批准年份:2007
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: GEO OSE Track 2: Developing CI-enabled collaborative workflows to integrate data for the SZ4D (Subduction Zones in Four Dimensions) community
协作研究:GEO OSE 轨道 2:开发支持 CI 的协作工作流程以集成 SZ4D(四维俯冲带)社区的数据
- 批准号:
2324714 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research: Constraining next generation Cascadia earthquake and tsunami hazard scenarios through integration of high-resolution field data and geophysical models
合作研究:通过集成高分辨率现场数据和地球物理模型来限制下一代卡斯卡迪亚地震和海啸灾害情景
- 批准号:
2325311 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research: CDS&E: data-enabled dynamic microstructural modeling of flowing complex fluids
合作研究:CDS
- 批准号:
2347345 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research: Data-Driven Elastic Shape Analysis with Topological Inconsistencies and Partial Matching Constraints
协作研究:具有拓扑不一致和部分匹配约束的数据驱动的弹性形状分析
- 批准号:
2402555 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research: EAGER: IMPRESS-U: Groundwater Resilience Assessment through iNtegrated Data Exploration for Ukraine (GRANDE-U)
合作研究:EAGER:IMPRESS-U:通过乌克兰综合数据探索进行地下水恢复力评估 (GRANDE-U)
- 批准号:
2409395 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research: Frameworks: MobilityNet: A Trustworthy CI Emulation Tool for Cross-Domain Mobility Data Generation and Sharing towards Multidisciplinary Innovations
协作研究:框架:MobilityNet:用于跨域移动数据生成和共享以实现多学科创新的值得信赖的 CI 仿真工具
- 批准号:
2411152 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research: CDS&E: data-enabled dynamic microstructural modeling of flowing complex fluids
合作研究:CDS
- 批准号:
2347344 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
III : Medium: Collaborative Research: From Open Data to Open Data Curation
III:媒介:协作研究:从开放数据到开放数据管理
- 批准号:
2420691 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research: BoCP-Implementation: Integrating Traits, Phylogenies and Distributional Data to Forecast Risks and Resilience of North American Plants
合作研究:BoCP-实施:整合性状、系统发育和分布数据来预测北美植物的风险和恢复力
- 批准号:
2325835 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research: Fusion of Siloed Data for Multistage Manufacturing Systems: Integrative Product Quality and Machine Health Management
协作研究:多级制造系统的孤立数据融合:集成产品质量和机器健康管理
- 批准号:
2323083 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant