III: Small: Partitioning Big Data for the High Performance Computation of Persistent Homology

III:小:对大数据进行分区以实现持久同调的高性能计算

基本信息

  • 批准号:
    1909096
  • 负责人:
  • 金额:
    $ 49.93万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2019
  • 资助国家:
    美国
  • 起止时间:
    2019-10-01 至 2023-09-30
  • 项目状态:
    已结题

项目摘要

New insights with machine learning exists across may domains, including, for example, medicine, social media, image processing, biology, and computer and network security. Machine learning is able to process large, high-dimensional data sets that are beyond human capabilities. One emerging method of machine learning is based on a branch of mathematics called topology that is sometimes able to discover knowledge that is not available using conventional methods. The field of topology is concerned with of the shape of an object and Persistent Homology is the critical method in topology used to extract the features of a shape. Persistent Homology will classify an object by the size and number of holes and voids in that object. Unfortunately, computing the Persistent Homology for an object requires significant amounts of memory and long run-times that increases exponentially in the number of points that forms that object. This project will treat the object formed by the data and subdivide it into smaller regions for the parallel computation of Persistent Homology on each region. The results from the regional analyses will then be assembled together and any duplicate or missing results will be identified and restored in a post analysis step. The computation on all of the regions will be completed in substantially less time and in much less total memory than a single computation on the entire data set. Testing of the methods developed will be performed using a variety of synthetic and real-world data. The synthetic data will permit controlled studies on performance and scalability. Realworld data from a variety of sources and especially data where the small topological features are significant (such as data from brain scans) will be used. This project will propel the application of topology based analysis to discover new insights and meaningful information from massive high-dimensional data. An expansion of student training in data mining through topological-based methods will be achieved with the addition of classes, projects (senior project, MS Theses, PhD Dissertations, and so on), seminars, and research co-op training experiences. Students at all levels will be impacted and special emphasis placed on minority and underrepresented student groups participation. This project will also participate in the Women in Science and Engineering programs at UC. The project investigators will engage local area K-12 students, international exchange students and researchers at UC's collaborative institutions, UC's Medical School, Cincinnati Children's Medical Center, the Air Force Research Lab, and local industries with information and seminars on this project investigations and results.This project proposes to combine the fields of Approximate Computing with Topological Data Analysis to dramatically reduce the computational and memory requirements to use Topological Data Analysis on very large data sets. In particular, this project will develop approximate methods for computing Persistent Homology that dramatically increase the sizes of data sets for which data mining methods based on topological data analysis can be applied. This project expects to increase the size of the input data set that can be analyzed by Topological Data Analysis methods by at least 3-5 orders of magnitude. While approximate methods can introduce error, the features identified by the approximate methods will identify regions of the point cloud where an upscaling steps and regional computations of Persistent Homology can be used (in parallel) to establish more precise boundaries of those features. The project will develop algorithmic improvements, formal statements on the correctness, error bounds, and complexities of the algorithms and approximation techniques. These techniques have important implications on the ability to apply topological data analysis techniques to much larger data sets than currently possible.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
机器学习的新见解存在于许多领域,包括例如医学、社交媒体、图像处理、生物学以及计算机和网络安全。机器学习能够处理超出人类能力的大型高维数据集。一种新兴的机器学习方法是基于称为拓扑学的数学分支,它有时能够发现使用传统方法无法获得的知识。拓扑学研究的是物体的形状,而持久同调是拓扑学中提取形状特征的关键方法。持久同源性将根据对象中的孔和空隙的大小和数量对对象进行分类。不幸的是,计算一个对象的持久同源性需要大量的内存和长的运行时间,这在形成该对象的点的数量上呈指数级增长。这个项目将处理由数据形成的对象,并将其细分为更小的区域,以便在每个区域上并行计算持久同源性。然后将区域分析的结果汇总在一起,并在分析后步骤中识别和恢复任何重复或缺失的结果。对所有区域的计算将在比对整个数据集的单个计算少得多的时间和少得多的总存储器中完成。将使用各种合成数据和真实数据对所开发的方法进行测试。合成数据将允许对性能和可扩展性进行受控研究。将使用来自各种来源的真实世界数据,尤其是小拓扑特征很重要的数据(例如来自脑部扫描的数据)。该项目将推动基于拓扑分析的应用,从海量高维数据中发现新的见解和有意义的信息。通过基于拓扑的方法扩展学生数据挖掘培训将通过增加课程,项目(高级项目,MS论文,博士论文等),研讨会和研究合作培训经验来实现。各级学生将受到影响,并特别强调少数民族和代表性不足的学生群体的参与。该项目还将参与加州大学的妇女科学和工程项目。项目调查人员将与当地K-12学生、国际交换生和加州大学合作机构、加州大学医学院、辛辛那提儿童医疗中心、空军研究实验室、该项目提出将近似计算与拓扑数据分析领域联合收割机相结合,以大大减少在非常大的数据集上使用拓扑数据分析的计算和内存要求。 特别是,本项目将开发计算持久同源性的近似方法,大大增加数据集的大小,数据挖掘方法的基础上拓扑数据分析可以应用。该项目预计将可通过拓扑数据分析方法分析的输入数据集的大小增加至少3-5个数量级。虽然近似方法可能会引入误差,但通过近似方法识别的特征将识别点云的区域,其中可以(并行地)使用持续同源性的放大步骤和区域计算来建立这些特征的更精确边界。该项目将开发算法的改进,对算法和近似技术的正确性,误差范围和复杂性的正式声明。这些技术对将拓扑数据分析技术应用于比目前可能的大得多的数据集的能力具有重要意义。该奖项反映了NSF的法定使命,并被认为值得通过使用基金会的知识价值和更广泛的影响审查标准进行评估来支持。

项目成果

期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Persistence Homology of Proximity Hyper-Graphs for Higher Dimensional Big Data
高维大数据的邻近超图的持久同源性
Fast Computation of Persistent Homology with Data Reduction and Data Partitioning
Computation of persistent homology on streaming data using topological data summaries
使用拓扑数据摘要计算流数据上的持久同源性
  • DOI:
    10.1111/coin.12597
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    2.8
  • 作者:
    Moitra, Anindya;Malott, Nicholas O.;Wilsey, Philip A.
  • 通讯作者:
    Wilsey, Philip A.
Topology Preserving Data Reduction for Computing Persistent Homology
Homology-Separating Triangulated Euler Characteristic Curve
同调分离三角欧拉特征曲线
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Philip Wilsey其他文献

Philip Wilsey的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Philip Wilsey', 18)}}的其他基金

SI2-SSE: Scalable Big Data Clustering by Random Projection Hashing
SI2-SSE:通过随机投影哈希进行可扩展的大数据集群
  • 批准号:
    1440420
  • 财政年份:
    2014
  • 资助金额:
    $ 49.93万
  • 项目类别:
    Standard Grant
CSR: Small: Collaborative Research: Combining Static Analysis and Dynamic Run-time Optimization for Parallel Discrete Event Simulation in Many-Core Environments
CSR:小型:协作研究:结合静态分析和动态运行时优化,实现多核环境中的并行离散事件仿真
  • 批准号:
    0915337
  • 财政年份:
    2009
  • 资助金额:
    $ 49.93万
  • 项目类别:
    Standard Grant

相似国自然基金

昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
  • 批准号:
    n/a
  • 批准年份:
    2022
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
  • 批准号:
    32000033
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
  • 批准号:
    31972324
  • 批准年份:
    2019
  • 资助金额:
    58.0 万元
  • 项目类别:
    面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
  • 批准号:
    81900988
  • 批准年份:
    2019
  • 资助金额:
    21.0 万元
  • 项目类别:
    青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
  • 批准号:
    31870821
  • 批准年份:
    2018
  • 资助金额:
    56.0 万元
  • 项目类别:
    面上项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
  • 批准号:
    31802058
  • 批准年份:
    2018
  • 资助金额:
    26.0 万元
  • 项目类别:
    青年科学基金项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
  • 批准号:
    31772128
  • 批准年份:
    2017
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
  • 批准号:
    81704176
  • 批准年份:
    2017
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
  • 批准号:
    91640114
  • 批准年份:
    2016
  • 资助金额:
    85.0 万元
  • 项目类别:
    重大研究计划

相似海外基金

AF: SMALL: Submodular Functions and Hypergraphs: Partitioning and Connectivity
AF:SMALL:子模函数和超图:分区和连接
  • 批准号:
    2402667
  • 财政年份:
    2024
  • 资助金额:
    $ 49.93万
  • 项目类别:
    Standard Grant
OAC Core: Small: Architecture and Network-aware Partitioning Algorithms for Scalable PDE Solvers
OAC 核心:小型:可扩展 PDE 求解器的架构和网络感知分区算法
  • 批准号:
    2008772
  • 财政年份:
    2020
  • 资助金额:
    $ 49.93万
  • 项目类别:
    Standard Grant
AF: Small: Cuts, Connectivity and Partitioning in Graphs, Hypergraphs and Beyond
AF:小:图、超图及其他领域的切割、连接和分区
  • 批准号:
    1907937
  • 财政年份:
    2019
  • 资助金额:
    $ 49.93万
  • 项目类别:
    Standard Grant
CCF-BSF: AF: Small: Metric Embeddings and Partitioning for Minor-Closed Graph Families
CCF-BSF:AF:小:次封闭图族的度量嵌入和分区
  • 批准号:
    1617790
  • 财政年份:
    2016
  • 资助金额:
    $ 49.93万
  • 项目类别:
    Standard Grant
CHS: Small: Printable Partitioning of 3D Models using Level Set Methods
CHS:小:使用水平集方法对 3D 模型进行可打印分区
  • 批准号:
    1524992
  • 财政年份:
    2015
  • 资助金额:
    $ 49.93万
  • 项目类别:
    Standard Grant
AF: Small: Towards better geometric algorithms: Summarizing, partitioning and shrinking data
AF:小:迈向更好的几何算法:汇总、分区和缩小数据
  • 批准号:
    1421231
  • 财政年份:
    2014
  • 资助金额:
    $ 49.93万
  • 项目类别:
    Standard Grant
AF: Small: Algorithms for Graph Routing, Drawing and Partitioning
AF:小型:图形路由、绘图和分区算法
  • 批准号:
    1318242
  • 财政年份:
    2014
  • 资助金额:
    $ 49.93万
  • 项目类别:
    Standard Grant
AF: Small: Graph Partitioning and Spectral Methods
AF:小:图划分和谱方法
  • 批准号:
    1540685
  • 财政年份:
    2014
  • 资助金额:
    $ 49.93万
  • 项目类别:
    Standard Grant
NeTS: Small: Meta-Networking Research: Analysis, Partitioning, and Mapping Tools for Large Experiments
NeTS:小型:元网络研究:大型实验的分析、分区和映射工具
  • 批准号:
    1319924
  • 财政年份:
    2013
  • 资助金额:
    $ 49.93万
  • 项目类别:
    Standard Grant
AF: Small: Approximation Algorithms for Uncertain Environments and Graph Partitioning
AF:小:不确定环境和图分区的近似算法
  • 批准号:
    1319811
  • 财政年份:
    2013
  • 资助金额:
    $ 49.93万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了