EAGER:High Performance Algorithms for Interactive Data Science at Scale

EAGER:大规模交互式数据科学的高性能算法

基本信息

  • 批准号:
    2109988
  • 负责人:
  • 金额:
    $ 18.74万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-03-01 至 2025-06-30
  • 项目状态:
    未结题

项目摘要

A real-world challenge in data science is to develop interactive methods for quickly analyzing new and novel data sets that are potentially of massive scale. This award will design and implement fundamental algorithms for high performance computing solutions that enable the interactive large-scale data analysis of massive data sets. Based on the widely-used data types and structures of strings, sets, matrices and graphs, this methodology will produce efficient and scalable software for three classes of fundamental algorithms that will drastically improve the performance on a wide range of real-world queries or directly realize frequent queries. These innovations will allow the broad community to move massive-scale data exploration from time-consuming batch processing to interactive analyses that give a data analyst the ability to comprehensively, deeply and efficiently explore the insights and science in real world data sets. By enabling the increasing number of developers to easily manipulate large data sets, this will greatly enlarge the data science community and find much broader use in new communities. Materials from this project will be included in graduate and undergraduate course curriculum. Especially, women, high school students and other underrepresented groups in STEM areas will be encouraged to participate in this research activity. This project focuses on these three important data structures for data analytics: 1) suffix array construction, 2) 'treap' construction and 3) distributed memory join algorithms, useful for analyzing large scale strings, implementing random search in large string data sets, and generating new relations, respectively. These fundamental algorithms serve as the cornerstone to support interactive data science at scale. Based on the theoretical achievements and systematic algorithm design, a novel symbiotic optimization methodology that can combine the theoretical analysis, data structure features, and typical data distribution features together as a whole will be developed to significantly improve the practical performance of the proposed algorithms. To evaluate and show the effectiveness of the proposed algorithms, these algorithms will be implemented in and contribute to an open source NumPy-like software framework that aims to provide productive data discovery tools on massive, dozens-of-terabytes data sets by bringing together the productivity of Python with world-class high performance computing.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
数据科学中一个现实世界的挑战是开发交互式方法来快速分析潜在大规模的新的和新颖的数据集。该奖项将设计和实施高性能计算解决方案的基本算法,使海量数据集的交互式大规模数据分析成为可能。基于广泛使用的字符串、集合、矩阵和图形的数据类型和结构,该方法将为三类基本算法生成高效和可扩展的软件,这些软件将显著提高对广泛现实世界查询的性能或直接实现频繁查询。这些创新将使广大社区能够将大规模数据探索从耗时的批处理转移到交互式分析,使数据分析师能够全面、深入和高效地探索现实世界数据集中的洞察力和科学。通过使越来越多的开发人员能够轻松地操作大型数据集,这将极大地扩大数据科学界,并在新的社区中找到更广泛的用途。这个项目的材料将包括在研究生和本科课程的课程中。特别是,将鼓励STEM地区的妇女、高中生和其他代表性不足的群体参加这项研究活动。本项目主要研究数据分析中的三种重要数据结构:1)后缀数组构造,2)‘treap’构造和3)分布式内存连接算法,分别用于分析大规模字符串、在大型字符串数据集中实现随机搜索和生成新的关系。这些基本算法是支持大规模交互数据科学的基石。在理论成果和系统算法设计的基础上,提出一种新的共生优化方法,将理论分析、数据结构特征和典型数据分布特征有机地结合在一起,显著提高算法的实用性能。为了评估和展示提出的算法的有效性,这些算法将在一个类似NumPy的开源软件框架中实现并做出贡献,该框架旨在通过将Python的生产力与世界一流的高性能计算相结合,在海量、数十TB的数据集上提供高效的数据发现工具。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(26)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Triangle Counting Through Cover-Edges
通过盖边缘进行三角形计数
Triangle Centrality in Arkouda
Arkouda 的三角形中心性
Parallel Longest Common SubSequence Analysis In Chapel
Anti-Section Transitive Closure
Fast Triangle Counting
快速三角形计数
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

David Bader其他文献

The effect of combined spinal-epidural anesthesia versus general anesthesia on the recovery time of intestinal function in young infants undergoing intestinal surgery: a randomized, prospective, controlled trial
  • DOI:
    10.1016/j.jclinane.2012.02.004
  • 发表时间:
    2012-09-01
  • 期刊:
  • 影响因子:
  • 作者:
    Mostafa Somri;Ibrahim Matter;Constantinos A. Parisinos;Ron Shaoul;Jorge G. Mogilner;David Bader;Eldar Asphandiarov;Luis A. Gaitini
  • 通讯作者:
    Luis A. Gaitini
DECREASED LYMPHOCYTIC BETA ADRENORECEPTOR BINDING CAPACITY IN APNEA OF INFANCY
  • DOI:
    10.1203/00006450-198704010-00256
  • 发表时间:
    1987-04-01
  • 期刊:
  • 影响因子:
    3.100
  • 作者:
    David Bader;S Buckley;T G Keens;D Warburton
  • 通讯作者:
    D Warburton
Investigating an interchangeable potential between heart and gut mesothelial development
  • DOI:
    10.1016/j.ydbio.2011.05.236
  • 发表时间:
    2011-08-01
  • 期刊:
  • 影响因子:
  • 作者:
    Rebecca T. Thomason;Niki Winters;Emily Cross;David Bader
  • 通讯作者:
    David Bader
Local cues influence atrial and ventricular differentiation of precardiac mesoderm
  • DOI:
    10.1016/s0022-2828(87)80673-9
  • 发表时间:
    1987-01-01
  • 期刊:
  • 影响因子:
  • 作者:
    Jonathan Satin;David Bader;Robert L. DeHaan
  • 通讯作者:
    Robert L. DeHaan
Unintended Consequence: Diversity as a Casualty of Eliminating United States Medical Licensing Examination Step 1 Scores
  • DOI:
    10.1016/j.jacr.2023.07.019
  • 发表时间:
    2023-11-01
  • 期刊:
  • 影响因子:
  • 作者:
    Felipe M. Campos;Lars J. Grimm;Charles M. Maxfield;Sabina Amin;David Bader;Brooke Beckett;Kevin Carter;Teresa Chapman;Bernard Chow;Amanda Derylo;Francis Flaherty;Michael Fox;Jennifer Gould;Robert Groves;Darel Heitkamp;John Heymann;Christopher Ho;Marion Hughes;Nathan Hull;Abtin Jafroodifar
  • 通讯作者:
    Abtin Jafroodifar

David Bader的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('David Bader', 18)}}的其他基金

Collaborative Research:PPoSS:Planning: Streamware - A Scalable Framework for Accelerating Streaming Data Science
合作研究:PPoSS:规划:Streamware - 加速流数据科学的可扩展框架
  • 批准号:
    2118458
  • 财政年份:
    2021
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
Collaborative Research: PPoSS: Planning: Extreme-scale Sparse Data Analytics
协作研究:PPoSS:规划:超大规模稀疏数据分析
  • 批准号:
    2118385
  • 财政年份:
    2021
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
Collaborative Research: EMBRACE: Evolvable Methods for Benchmarking Realism through Application and Community Engagement
合作研究:拥抱:通过应用和社区参与对现实主义进行基准测试的演化方法
  • 批准号:
    1535058
  • 财政年份:
    2015
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
Collaborative Research: IEEE IPDPS Conference Student Participation Support
合作研究:IEEE IPDPS 会议学生参与支持
  • 批准号:
    1362300
  • 财政年份:
    2014
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
EAGER: Collaborative Research: Using PDE Descriptions to Generate Code Precisely Tailored to Energy-Constrained Systems Including Large GPU Accelerated Clusters
EAGER:协作研究:使用偏微分方程描述生成专门针对能源受限系统(包括大型 GPU 加速集群)定制的代码
  • 批准号:
    1265434
  • 财政年份:
    2013
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
SI2-SSI: Collaborative: The XScala Project: A Community Repository for Model-Driven Design and Tuning of Data-Intensive Applications for Extreme-Scale Accelerator-Based Systems
SI2-SSI:协作:XScala 项目:用于基于超大规模加速器的系统的模型驱动设计和数据密集型应用程序调整的社区存储库
  • 批准号:
    1339745
  • 财政年份:
    2013
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
Collaborative Research: Software Infrastructure for Accelerating Grand Challenge Science with Future Computing Platforms
协作研究:利用未来计算平台加速重大挑战科学的软件基础设施
  • 批准号:
    1216504
  • 财政年份:
    2012
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
Collaborative Research: Understanding Whole-genome Evolution through Petascale Simulation
合作研究:通过千万亿次模拟了解全基因组进化
  • 批准号:
    0904461
  • 财政年份:
    2009
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
Collaborative Research: Establishing an I/UCRC Center for Multicore Productivity Research (CMPR)
合作研究:建立 I/UCRC 多核生产力研究中心 (CMPR)
  • 批准号:
    0831110
  • 财政年份:
    2008
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
Collaborative Research: CRI: IAD: Development of a Research Infrastructure
合作研究:CRI:IAD:研究基础设施的开发
  • 批准号:
    0708307
  • 财政年份:
    2007
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Continuing Grant

相似海外基金

CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
  • 批准号:
    2348346
  • 财政年份:
    2024
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
  • 批准号:
    2339669
  • 财政年份:
    2024
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Continuing Grant
OAC Core: High Performance Computing Algorithms and Software for large-scale Mass Spectrometry based Omics
OAC Core:基于大规模质谱组学的高性能计算算法和软件
  • 批准号:
    2312599
  • 财政年份:
    2023
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
AF: Small: RUI: Toward High-Performance Block Krylov Subspace Algorithms for Solving Large-Scale Linear Systems
AF:小:RUI:用于求解大规模线性系统的高性能块 Krylov 子空间算法
  • 批准号:
    2327619
  • 财政年份:
    2023
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
High Performance Graph Algorithms and Data Structures
高性能图算法和数据结构
  • 批准号:
    RGPIN-2022-03207
  • 财政年份:
    2022
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Discovery Grants Program - Individual
Performance-Based Earthquake Engineering 2.0: Machine-Learning and Artificial Intelligence Algorithms for seismic hazard and vulnerability.
基于性能的地震工程 2.0:地震灾害和脆弱性的机器学习和人工智能算法。
  • 批准号:
    2765246
  • 财政年份:
    2022
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Studentship
Collaborative Research: SHF: Medium: Co-optimizing Spectral Algorithms and Systems for High-Performance Graph Learning
合作研究:SHF:中:协同优化高性能图学习的谱算法和系统
  • 批准号:
    2212370
  • 财政年份:
    2022
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Continuing Grant
CAREER: Enabling Progressive Data Analytics for High Performance Computing: Algorithms and System Support
职业:实现高性能计算的渐进式数据分析:算法和系统支持
  • 批准号:
    2144403
  • 财政年份:
    2022
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Continuing Grant
Design and Analysis of Algorithms for High-Performance Scientific Computing
高性能科学计算算法的设计与分析
  • 批准号:
    RGPIN-2019-05692
  • 财政年份:
    2022
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Discovery Grants Program - Individual
Collaborative Research: SHF: Medium: Co-optimizing Spectral Algorithms and Systems for High-Performance Graph Learning
合作研究:SHF:中:协同优化高性能图学习的谱算法和系统
  • 批准号:
    2212371
  • 财政年份:
    2022
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了