SHF: EAGER: HI-HDFS - Holistic I/O optimizations for the Hadoop distributed filesystem

SHF:EAGER:HI-HDFS - Hadoop 分布式文件系统的整体 I/O 优化

基本信息

  • 批准号:
    1747447
  • 负责人:
  • 金额:
    $ 15万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2017
  • 资助国家:
    美国
  • 起止时间:
    2017-09-01 至 2018-08-31
  • 项目状态:
    已结题

项目摘要

File systems and their outdated POSIX "byte stream" interface suffer from an impedance mismatch with the versatile I/O requirements of today's applications. Specifically, the I/O path from the application to the raw storage device is becoming longer and it involves the interplay of intricate software and hardware components. This produces complex aggregate I/O patterns that application developers (often subject matter experts with limited knowledge of how massive concurrency creates I/O bottlenecks) cannot optimize based on intuition alone. File systems that tout their high scalability, such as the Hadoop distributed file system, largely do so by limiting applications to sequential access patterns. The question of whether one can accelerate the I/O performance of the Hadoop distributed file system for analytical applications with complex data models that cannot readily serialize data contiguously for fast sequential access remains open. This project seeks to address this question and build HI-HDFS -- a framework that automatically collects and manages semantically richer I/O metadata to guide object placement in the Hadoop distributed file system. The HI-HDFS framework synthesizes the I/O activity across software components throughout the datacenter in a navigable graph structure to identify application-agnostic motifs in I/O activity. A novel I/O forecasting technique identifies and ameliorates bottlenecks at large scale by inspecting I/O activity from small-scale runs. Overall, the HI-HDFS framework challenges the I/O optimization mantra that manual data placement is the cornerstone of I/O performance and paves the way towards next-generation object-centric storage systems for high-performance computers. The efficacy of this automated approach will be examined on a complex data processing workload from the domain of emergency response which exhibits I/O patterns that are characteristic of modern analytical applications. The broader impacts of this work are expected to include open-source prototype implementations as well as pedagogical impact on a cloud computing course for both Computer Science and Data Analytics undergraduate majors at Ohio State.
文件系统及其过时的POSIX“字节流”接口与当今应用程序的通用I/O需求存在阻抗不匹配。具体而言,从应用程序到原始存储设备的I/O路径变得越来越长,并且涉及复杂的软件和硬件组件的相互作用。这会产生复杂的聚合I/O模式,应用程序开发人员(通常是对大规模并发如何产生I/O瓶颈了解有限的主题专家)无法仅凭直觉进行优化。文件系统,如Hadoop分布式文件系统,主要通过将应用程序限制为顺序访问模式来实现其高可伸缩性。对于具有复杂数据模型的分析应用程序,是否可以加速Hadoop分布式文件系统的I/O性能的问题仍然是开放的,这些复杂数据模型不能容易地连续序列化数据以进行快速顺序访问。该项目旨在解决这个问题并构建HI-HDFS --一个自动收集和管理语义更丰富的I/O元数据以指导Hadoop分布式文件系统中的对象放置的框架。HI-HDFS框架在可导航的图形结构中综合了整个数据中心的软件组件的I/O活动,以识别I/O活动中的应用程序不可知基序。一种新的I/O预测技术通过检查小规模运行的I/O活动来识别和改善大规模的瓶颈。总的来说,HI-HDFS框架挑战了I/O优化的咒语,即手动数据放置是I/O性能的基石,并为高性能计算机的下一代以对象为中心的存储系统铺平了道路。这种自动化的方法的有效性将被检查的复杂的数据处理工作量,从域的应急响应,表现出I/O模式,是现代分析应用程序的特点。这项工作的更广泛的影响预计将包括开源原型实现以及对俄亥俄州计算机科学和数据分析本科专业的云计算课程的教学影响。

项目成果

期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
ATP: Directed Graph Embedding with Asymmetric Transitivity Preservation
  • DOI:
    10.1609/aaai.v33i01.3301265
  • 发表时间:
    2018-11
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jiankai Sun-;Bortik Bandyopadhyay;Armin Bashizade;Jiongqian Liang;P. Sadayappan;S. Parthasarathy
  • 通讯作者:
    Jiankai Sun-;Bortik Bandyopadhyay;Armin Bashizade;Jiongqian Liang;P. Sadayappan;S. Parthasarathy
ArrayBridge: Interweaving Declarative Array Processing in SciDB with Imperative HDF5-Based Programs
ArrayBridge:将 SciDB 中的声明性数组处理与基于 HDF5 的命令式程序交织在一起
ApproxJoin: Approximate Distributed Joins
  • DOI:
    10.1145/3267809.3267834
  • 发表时间:
    2018-10
  • 期刊:
  • 影响因子:
    0
  • 作者:
    D. Quoc;Istemi Ekin Akkus;Pramod Bhatotia;Spyros Blanas;Ruichuan Chen;C. Fetzer;T. Strufe
  • 通讯作者:
    D. Quoc;Istemi Ekin Akkus;Pramod Bhatotia;Spyros Blanas;Ruichuan Chen;C. Fetzer;T. Strufe
Characterizing I/O optimization opportunities for array-centric applications on HDFS
  • DOI:
    10.1109/hpec.2018.8547529
  • 发表时间:
    2018-09
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Donghe Kang;Vedang Patel;Kalyan Khandrika;Spyros Blanas;Yang Wang;S. Parthasarathy
  • 通讯作者:
    Donghe Kang;Vedang Patel;Kalyan Khandrika;Spyros Blanas;Yang Wang;S. Parthasarathy
Evaluating Scalability Bottlenecks by Workload Extrapolation
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Spyros Blanas其他文献

In-Memory Transactions
Query Processing on Gaming Consoles
游戏机上的查询处理
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Wei Cui;Qianxi Zhang;Spyros Blanas;Jesús Camacho;Brandon Haynes;Yinan Li;Ravishankar Ramamurthy;Peng Cheng;Rathijit Sen;Matteo Interlandi
  • 通讯作者:
    Matteo Interlandi
Engineering Security and Performance with Cipherbase
使用 Cipherbase 进行工程安全和性能
  • DOI:
  • 发表时间:
    2012
  • 期刊:
  • 影响因子:
    0
  • 作者:
    A. Arasu;Spyros Blanas;Ken Eguro;Manas R. Joglekar;R. Kaushik;Donald Kossmann;Ravishankar Ramamurthy;P. Upadhyaya;R. Venkatesan
  • 通讯作者:
    R. Venkatesan
ApproxJoin
近似连接
GRaSP: generalized range search in peer-to-peer networks
GRaSP:对等网络中的广义范围搜索

Spyros Blanas的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Spyros Blanas', 18)}}的其他基金

SHF: Small: Hyperscaling Data Analytics for High-Performance Computers
SHF:小型:高性能计算机的超大规模数据分析
  • 批准号:
    1816577
  • 财政年份:
    2018
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
CRII: III: Declarative array processing for large-scale scientific analyses
CRII:III:用于大规模科学分析的声明性数组处理
  • 批准号:
    1464381
  • 财政年份:
    2015
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant

相似海外基金

Collaborative Research: EAGER: The next crisis for coral reefs is how to study vanishing coral species; AUVs equipped with AI may be the only tool for the job
合作研究:EAGER:珊瑚礁的下一个危机是如何研究正在消失的珊瑚物种;
  • 批准号:
    2333604
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
EAGER/Collaborative Research: An LLM-Powered Framework for G-Code Comprehension and Retrieval
EAGER/协作研究:LLM 支持的 G 代码理解和检索框架
  • 批准号:
    2347624
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
EAGER: Innovation in Society Study Group
EAGER:社会创新研究小组
  • 批准号:
    2348836
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
EAGER: Artificial Intelligence to Understand Engineering Cultural Norms
EAGER:人工智能理解工程文化规范
  • 批准号:
    2342384
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
EAGER/Collaborative Research: Revealing the Physical Mechanisms Underlying the Extraordinary Stability of Flying Insects
EAGER/合作研究:揭示飞行昆虫非凡稳定性的物理机制
  • 批准号:
    2344215
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Collaborative Research: EAGER: Designing Nanomaterials to Reveal the Mechanism of Single Nanoparticle Photoemission Intermittency
合作研究:EAGER:设计纳米材料揭示单纳米粒子光电发射间歇性机制
  • 批准号:
    2345581
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Collaborative Research: EAGER: Designing Nanomaterials to Reveal the Mechanism of Single Nanoparticle Photoemission Intermittency
合作研究:EAGER:设计纳米材料揭示单纳米粒子光电发射间歇性机制
  • 批准号:
    2345582
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Collaborative Research: EAGER: Designing Nanomaterials to Reveal the Mechanism of Single Nanoparticle Photoemission Intermittency
合作研究:EAGER:设计纳米材料揭示单纳米粒子光电发射间歇性机制
  • 批准号:
    2345583
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
EAGER: Accelerating decarbonization by representing catalysts with natural language
EAGER:通过用自然语言表示催化剂来加速脱碳
  • 批准号:
    2345734
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
  • 批准号:
    2404989
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了