CRII: OAC: Scalable Cyberinfrastructure for Big Graph and Matrix/Tensor Analytics

CRII:OAC:用于大图和矩阵/张量分析的可扩展网络基础设施

基本信息

  • 批准号:
    1755464
  • 负责人:
  • 金额:
    $ 17.09万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-06-01 至 2022-05-31
  • 项目状态:
    已结题

项目摘要

The existing distributed graph and matrix analytics frameworks are designed with data-intensive workloads in mind, rendering them inefficient for compute-intensive applications such as graph mining and scientific computing. The goal of this project is to develop novel big data frameworks for two compute-intensive tasks, graph mining and matrix/tensor computations, respectively. The two frameworks advance the field of big data analytics by motivating future systems for compute-intensive analytics, and promoting their application in various scientific areas to improve research productivity. The two systems will be available for public use, and can serve several cross-disciplinary projects in computer forensics, computational physics, and bioinformatics. The project includes mentoring graduate students and training K-12 students through summer internships, as well as related new course materials and outreach activities to help the public learn big data technologies. Thus, the project aligns with the NSF's mission to promote the progress of science and to advance the national health and prosperity.The graph mining system and the matrix/tensor platform share the design of (i) a tailor-made storage subsystem providing efficient and flexible data access, and (ii) a computation subsystem with fine-grained task control for data-reuse-aware task assignment and load balancing. The graph mining system, called G-thinker, aims to facilitate the writing of distributed programs which mine from a big graph those subgraphs that satisfy certain requirements. Such mining problems are useful in many applications like community detection and subgraph matching. These problems usually have a high computational complexity, and existing serial algorithms tackle these problems by backtracking in a duplication-free vertex-set numeration tree, which recursively partitions the search space. G-thinker adopts an intuitive programming interface that minimizes the effort of adapting an existing serial subgraph mining algorithm for distributed execution. The subgraphs to mine are spawned from individual vertices and they grow their frontiers as needed, and memory overflow is avoided by spilling subgraphs to disks when needed. In each machine, vertices and edges shared by multiple subgraphs need only be transmitted and cached once, which minimizes communication (and hence data waiting) so that CPU cores are better utilized. To address the load-balancing problem of power-law graphs, G-thinker explores recursive decomposition and work stealing to allow idle machines to steal subgraphs for mining from heavily-loaded machines. The project also explores a distributed matrix/tensor storage and computing framework, where matrix/tensor partitions are stored in multiple replicas using different storage schemes to efficiently support all kinds of submatrix access operations. This flexible storage scheme offers the upper-layer computations much more opportunities for fine-grained optimizations, including smarter task scheduling and in-situ updates. The use of this framework is exemplified by matrix multiplication and LU factorization. Both of the proposed frameworks can help build a cyberinfrastructure for collaborations with scientists in science, medicine, and industry.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现有的分布式图和矩阵分析框架在设计时考虑到了数据密集型工作负载,这使得它们对于计算密集型应用(如图挖掘和科学计算)效率低下。该项目的目标是分别为两个计算密集型任务(图挖掘和矩阵/张量计算)开发新的大数据框架。这两个框架通过激励未来的计算密集型分析系统,并促进其在各个科学领域的应用,以提高研究生产力,从而推动大数据分析领域的发展。这两个系统将可供公众使用,并可以服务于计算机取证,计算物理和生物信息学的几个跨学科项目。该项目包括通过暑期实习指导研究生和培训K-12学生,以及相关的新课程材料和外联活动,以帮助公众学习大数据技术。因此,该项目符合美国国家科学基金会的使命,以促进科学的进步和促进国家的健康和繁荣。图挖掘系统和矩阵/张量平台共享的设计(i)一个定制的存储子系统,提供高效和灵活的数据访问,和(ii)一个计算子系统,具有细粒度的任务控制,数据重用意识的任务分配和负载平衡。图挖掘系统,称为G-thinker,旨在促进分布式程序的编写,从一个大图中挖掘满足某些要求的子图。这样的挖掘问题在社区检测和子图匹配等许多应用中是有用的。这些问题通常具有很高的计算复杂度,现有的串行算法通过在无重复的顶点集计数树中回溯来解决这些问题,该计数树递归地划分搜索空间。G-thinker采用直观的编程界面,最大限度地减少了将现有串行子图挖掘算法用于分布式执行的工作。要挖掘的子图是从各个顶点产生的,它们会根据需要扩大边界,并且通过在需要时将子图溢出到磁盘来避免内存溢出。在每台机器中,由多个子图共享的顶点和边只需要传输和缓存一次,这最大限度地减少了通信(以及数据等待),从而更好地利用CPU内核。为了解决幂律图的负载平衡问题,G-thinker探索了递归分解和工作窃取,允许空闲机器从负载沉重的机器中窃取子图进行挖掘。该项目还探索了分布式矩阵/张量存储和计算框架,其中矩阵/张量分区使用不同的存储方案存储在多个副本中,以有效地支持各种子矩阵访问操作。这种灵活的存储方案为上层计算提供了更多的细粒度优化机会,包括更智能的任务调度和原位更新。这个框架的使用是由矩阵乘法和LU分解的例子。这两个拟议的框架都有助于建立一个网络基础设施,以便与科学、医学和工业领域的科学家进行合作。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(17)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
The Future is Big Graphs! A Community View on Graph Processing Systems
未来是大图!
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    22.7
  • 作者:
    Sakr, Sherif;Bonifati, Angela;Voigt, Hannes;Iosup, Alexandru;Ammar, Khaled;Angles, Renzo;Aref, Walid G.;Arenas, Marcelo;Besta, Maciej;Boncz, Peter A.
  • 通讯作者:
    Boncz, Peter A.
Parallel mining of large maximal quasi-cliques
  • DOI:
    10.1007/s00778-021-00712-2
  • 发表时间:
    2021-11
  • 期刊:
  • 影响因子:
    0
  • 作者:
    J. Khalil;Da Yan;Guimu Guo;Lyuheng Yuan
  • 通讯作者:
    J. Khalil;Da Yan;Guimu Guo;Lyuheng Yuan
T-thinker: a task-centric distributed framework for compute-intensive divide-and-conquer algorithms
T-thinker:用于计算密集型分而治之算法的以任务为中心的分布式框架
G-thinker: a general distributed framework for finding qualified subgraphs in a big graph with load balancing
  • DOI:
    10.1007/s00778-021-00688-z
  • 发表时间:
    2021-08
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Da Yan;Guimu Guo;J. Khalil;M. Tamer Özsu;Wei-Shinn Ku;John C.S. Lui
  • 通讯作者:
    Da Yan;Guimu Guo;J. Khalil;M. Tamer Özsu;Wei-Shinn Ku;John C.S. Lui
PrefixFPM: A Parallel Framework for General-Purpose Frequent Pattern Mining
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Da Yan其他文献

Spatial-Logic-Aware Weakly Supervised Learning for Flood Mapping on Earth Imagery
地球图像洪水测绘的空间逻辑感知弱监督学习
  • DOI:
    10.1609/aaai.v38i20.30253
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zelin Xu;Tingsong Xiao;Wenchong He;Yu Wang;Zhe Jiang;Shigang Chen;Yiqun Xie;Xiaowei Jia;Da Yan;Yang Zhou
  • 通讯作者:
    Yang Zhou
Ten questions on future and extreme weather data for building simulation and analysis in a changing climate
关于未来以及极端天气数据用于气候变化下的建筑模拟与分析的十个问题
  • DOI:
    10.1016/j.buildenv.2024.112461
  • 发表时间:
    2025-02-01
  • 期刊:
  • 影响因子:
    7.600
  • 作者:
    Da Yan;Yi Wu;Jeetika Malik;Tianzhen Hong
  • 通讯作者:
    Tianzhen Hong
A district-level building electricity use profile simulation model based on probability distribution inferences
  • DOI:
    10.1016/j.scs.2024.105822
  • 发表时间:
    2024-11-15
  • 期刊:
  • 影响因子:
  • 作者:
    Xuyuan Kang;Hongyin Chen;Zhenlan Dou;Xiao Wang;Zhaoru Liu;Chunyan Zhang;Kunqi Jia;Da Yan
  • 通讯作者:
    Da Yan
Scientometric mapping of smart building research: Towards a framework of human-cyber-physical system (HCPS)
智能建筑研究的科学计量图谱:迈向人-网络-物理系统(HCPS)框架
  • DOI:
    10.1016/j.autcon.2021.103776
  • 发表时间:
    2021-09
  • 期刊:
  • 影响因子:
    10.3
  • 作者:
    Peixian Li;Yujie Lu;Da Yan;Jianzhuang Xiao;Huicang Wu
  • 通讯作者:
    Huicang Wu
Towards Understanding Sycophancy in Language Models
理解语言模型中的阿谀奉承
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Mrinank Sharma;Meg Tong;Tomasz Korbak;D. Duvenaud;Amanda Askell;Samuel R. Bowman;Newton Cheng;Esin Durmus;Zac Hatfield;Scott Johnston;Shauna Kravec;Tim Maxwell;Sam McCandlish;Kamal Ndousse;Oliver Rausch;Nicholas Schiefer;Da Yan;Miranda Zhang;Ethan Perez
  • 通讯作者:
    Ethan Perez

Da Yan的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Da Yan', 18)}}的其他基金

Collaborative Research: OAC CORE: Federated-Learning-Driven Traffic Event Management for Intelligent Transportation Systems
合作研究:OAC CORE:智能交通系统的联邦学习驱动的交通事件管理
  • 批准号:
    2414474
  • 财政年份:
    2024
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Large-Scale Spatial Machine Learning for 3D Surface Topology in Hydrological Applications
合作研究:OAC 核心:水文应用中 3D 表面拓扑的大规模空间机器学习
  • 批准号:
    2414185
  • 财政年份:
    2024
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
RII Track-4: NSF: Massively Parallel Graph Processing on Next-Generation Multi-GPU Supercomputers
RII Track-4:NSF:下一代多 GPU 超级计算机上的大规模并行图形处理
  • 批准号:
    2229394
  • 财政年份:
    2023
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC CORE: Federated-Learning-Driven Traffic Event Management for Intelligent Transportation Systems
合作研究:OAC CORE:智能交通系统的联邦学习驱动的交通事件管理
  • 批准号:
    2313192
  • 财政年份:
    2023
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Large-Scale Spatial Machine Learning for 3D Surface Topology in Hydrological Applications
合作研究:OAC 核心:水文应用中 3D 表面拓扑的大规模空间机器学习
  • 批准号:
    2106461
  • 财政年份:
    2021
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant

相似国自然基金

Z8-12:OH和Z8-14:OAc分别维持梨小食心虫和李小食心虫性诱剂特异性的分子基础
  • 批准号:
  • 批准年份:
    2021
  • 资助金额:
    35 万元
  • 项目类别:
    地区科学基金项目
亚硝酰钌配合物[Ru(OAc)(2mqn)2NO]的光异构反应机理研究
  • 批准号:
    21603131
  • 批准年份:
    2016
  • 资助金额:
    19.0 万元
  • 项目类别:
    青年科学基金项目
机械化学条件下Mn(OAc)3促进的自由基串联反应研究
  • 批准号:
    21242013
  • 批准年份:
    2012
  • 资助金额:
    10.0 万元
  • 项目类别:
    专项基金项目

相似海外基金

OAC Core: A Scalable and Deployable Container Orchestration Cyber Infrastructure Toolkit for Deploying Big Data Analytics Applications in Public Cloud
OAC Core:用于在公共云中部署大数据分析应用程序的可扩展和可部署的容器编排网络基础设施工具包
  • 批准号:
    2313738
  • 财政年份:
    2023
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
OAC Core: Geometry-aware and Deep Learning-based Cyberinfrastructure for Scalable Modeling of Solids and Fluids
OAC 核心:基于几何感知和深度学习的网络基础设施,用于固体和流体的可扩展建模
  • 批准号:
    2211908
  • 财政年份:
    2022
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
OAC Core: A Scalable and Deployable Container Orchestration Cyber Infrastructure Toolkit for Deploying Big Data Analytics Applications in Public Cloud
OAC Core:用于在公共云中部署大数据分析应用程序的可扩展和可部署的容器编排网络基础设施工具包
  • 批准号:
    2212256
  • 财政年份:
    2022
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
OAC Core: Scalable Graph ML on Distributed Heterogeneous Systems
OAC 核心:分布式异构系统上的可扩展图 ML
  • 批准号:
    2209563
  • 财政年份:
    2022
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Robust, Scalable, and Practical Low Rank Approximation
合作研究:OAC 核心:稳健、可扩展且实用的低阶近似
  • 批准号:
    2106738
  • 财政年份:
    2021
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Robust, Scalable, and Practical Low-Rank Approximation
合作研究:OAC 核心:稳健、可扩展且实用的低阶近似
  • 批准号:
    2106920
  • 财政年份:
    2021
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
OAC Core: Small: Efficient and scalable tools for design and analysis of active matter systems
OAC 核心:小型:用于设计和分析活性物质系统的高效且可扩展的工具
  • 批准号:
    2007181
  • 财政年份:
    2020
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
OAC Core: Small: Architecture and Network-aware Partitioning Algorithms for Scalable PDE Solvers
OAC 核心:小型:可扩展 PDE 求解器的架构和网络感知分区算法
  • 批准号:
    2008772
  • 财政年份:
    2020
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
CRII: OAC: Scalable and Integrated Data Collection Platforms for Connected Vehicle Data
CRII:OAC:用于联网车辆数据的可扩展且集成的数据收集平台
  • 批准号:
    1948066
  • 财政年份:
    2020
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
OAC Core: Small: Collaborative Research: Scalable Run-Time for Highly Parallel, Heterogeneous Systems
OAC 核心:小型:协作研究:高度并行、异构系统的可扩展运行时
  • 批准号:
    1908144
  • 财政年份:
    2019
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了