SHF: EAGER: HI-HDFS - Holistic I/O optimizations for the Hadoop distributed filesystem
SHF:EAGER:HI-HDFS - Hadoop 分布式文件系统的整体 I/O 优化
基本信息
- 批准号:1747447
- 负责人:
- 金额:$ 15万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-09-01 至 2018-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
File systems and their outdated POSIX "byte stream" interface suffer from an impedance mismatch with the versatile I/O requirements of today's applications. Specifically, the I/O path from the application to the raw storage device is becoming longer and it involves the interplay of intricate software and hardware components. This produces complex aggregate I/O patterns that application developers (often subject matter experts with limited knowledge of how massive concurrency creates I/O bottlenecks) cannot optimize based on intuition alone. File systems that tout their high scalability, such as the Hadoop distributed file system, largely do so by limiting applications to sequential access patterns. The question of whether one can accelerate the I/O performance of the Hadoop distributed file system for analytical applications with complex data models that cannot readily serialize data contiguously for fast sequential access remains open. This project seeks to address this question and build HI-HDFS -- a framework that automatically collects and manages semantically richer I/O metadata to guide object placement in the Hadoop distributed file system. The HI-HDFS framework synthesizes the I/O activity across software components throughout the datacenter in a navigable graph structure to identify application-agnostic motifs in I/O activity. A novel I/O forecasting technique identifies and ameliorates bottlenecks at large scale by inspecting I/O activity from small-scale runs. Overall, the HI-HDFS framework challenges the I/O optimization mantra that manual data placement is the cornerstone of I/O performance and paves the way towards next-generation object-centric storage systems for high-performance computers. The efficacy of this automated approach will be examined on a complex data processing workload from the domain of emergency response which exhibits I/O patterns that are characteristic of modern analytical applications. The broader impacts of this work are expected to include open-source prototype implementations as well as pedagogical impact on a cloud computing course for both Computer Science and Data Analytics undergraduate majors at Ohio State.
文件系统及其过时的POSIX“字节流”接口与当今应用程序的通用I/O需求存在阻抗不匹配的问题。具体来说,从应用程序到原始存储设备的I/O路径变得越来越长,并且涉及到复杂的软件和硬件组件的相互作用。这就产生了复杂的聚合I/O模式,应用程序开发人员(通常是对大规模并发性如何造成I/O瓶颈知之甚少的主题专家)无法仅凭直觉对其进行优化。标榜其高可伸缩性的文件系统,比如Hadoop分布式文件系统,很大程度上是通过将应用程序限制为顺序访问模式来实现的。对于具有复杂数据模型的分析应用程序,是否可以加速Hadoop分布式文件系统的I/O性能,这些数据模型不能轻易地连续序列化数据以实现快速顺序访问,这个问题仍然没有解决。该项目旨在解决这个问题,并构建HI-HDFS——一个自动收集和管理语义更丰富的I/O元数据的框架,以指导Hadoop分布式文件系统中的对象放置。HI-HDFS框架在一个可导航的图结构中综合了整个数据中心的软件组件的I/O活动,以识别I/O活动中与应用程序无关的主题。一种新的I/O预测技术通过检查小规模运行的I/O活动来识别和改善大规模的瓶颈。总的来说,HI-HDFS框架挑战了手动数据放置是I/O性能基石的I/O优化口号,并为高性能计算机的下一代以对象为中心的存储系统铺平了道路。将在紧急反应领域的复杂数据处理工作量上审查这种自动化方法的效力,该领域显示了作为现代分析应用特征的输入/输出模式。这项工作预计将产生更广泛的影响,包括开源原型实现,以及对俄亥俄州立大学计算机科学和数据分析本科专业云计算课程的教学影响。
项目成果
期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
ATP: Directed Graph Embedding with Asymmetric Transitivity Preservation
- DOI:10.1609/aaai.v33i01.3301265
- 发表时间:2018-11
- 期刊:
- 影响因子:0
- 作者:Jiankai Sun-;Bortik Bandyopadhyay;Armin Bashizade;Jiongqian Liang;P. Sadayappan;S. Parthasarathy
- 通讯作者:Jiankai Sun-;Bortik Bandyopadhyay;Armin Bashizade;Jiongqian Liang;P. Sadayappan;S. Parthasarathy
ArrayBridge: Interweaving Declarative Array Processing in SciDB with Imperative HDF5-Based Programs
ArrayBridge:将 SciDB 中的声明性数组处理与基于 HDF5 的命令式程序交织在一起
- DOI:10.1109/icde.2018.00092
- 发表时间:2018
- 期刊:
- 影响因子:0
- 作者:Xing, Haoyuan;Floratos, Sofoklis;Blanas, Spyros;Byna, Suren;Prabhat, M.;Wu, Kesheng;Brown, Paul
- 通讯作者:Brown, Paul
ApproxJoin: Approximate Distributed Joins
- DOI:10.1145/3267809.3267834
- 发表时间:2018-10
- 期刊:
- 影响因子:0
- 作者:D. Quoc;Istemi Ekin Akkus;Pramod Bhatotia;Spyros Blanas;Ruichuan Chen;C. Fetzer;T. Strufe
- 通讯作者:D. Quoc;Istemi Ekin Akkus;Pramod Bhatotia;Spyros Blanas;Ruichuan Chen;C. Fetzer;T. Strufe
Characterizing I/O optimization opportunities for array-centric applications on HDFS
- DOI:10.1109/hpec.2018.8547529
- 发表时间:2018-09
- 期刊:
- 影响因子:0
- 作者:Donghe Kang;Vedang Patel;Kalyan Khandrika;Spyros Blanas;Yang Wang;S. Parthasarathy
- 通讯作者:Donghe Kang;Vedang Patel;Kalyan Khandrika;Spyros Blanas;Yang Wang;S. Parthasarathy
Evaluating Scalability Bottlenecks by Workload Extrapolation
- DOI:10.1109/mascots.2018.00039
- 发表时间:2018-09
- 期刊:
- 影响因子:0
- 作者:Rong Shi;Yifan Gan;Yang Wang
- 通讯作者:Rong Shi;Yifan Gan;Yang Wang
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Spyros Blanas其他文献
In-Memory Transactions
- DOI:
10.1007/978-3-319-63962-8_177-1 - 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Spyros Blanas - 通讯作者:
Spyros Blanas
Query Processing on Gaming Consoles
游戏机上的查询处理
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Wei Cui;Qianxi Zhang;Spyros Blanas;Jesús Camacho;Brandon Haynes;Yinan Li;Ravishankar Ramamurthy;Peng Cheng;Rathijit Sen;Matteo Interlandi - 通讯作者:
Matteo Interlandi
Engineering Security and Performance with Cipherbase
使用 Cipherbase 进行工程安全和性能
- DOI:
- 发表时间:
2012 - 期刊:
- 影响因子:0
- 作者:
A. Arasu;Spyros Blanas;Ken Eguro;Manas R. Joglekar;R. Kaushik;Donald Kossmann;Ravishankar Ramamurthy;P. Upadhyaya;R. Venkatesan - 通讯作者:
R. Venkatesan
ApproxJoin
近似连接
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
D. Quoc;Istemi Ekin Akkus;Pramod Bhatotia;Spyros Blanas;Ruichuan Chen;Christof Fetzer;Thorsten Strufe - 通讯作者:
Thorsten Strufe
GRaSP: generalized range search in peer-to-peer networks
GRaSP:对等网络中的广义范围搜索
- DOI:
10.4108/icst.infoscale2008.3533 - 发表时间:
2008 - 期刊:
- 影响因子:0
- 作者:
M. Argyriou;V. Samoladas;Spyros Blanas - 通讯作者:
Spyros Blanas
Spyros Blanas的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Spyros Blanas', 18)}}的其他基金
SHF: Small: Hyperscaling Data Analytics for High-Performance Computers
SHF:小型:高性能计算机的超大规模数据分析
- 批准号:
1816577 - 财政年份:2018
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
CRII: III: Declarative array processing for large-scale scientific analyses
CRII:III:用于大规模科学分析的声明性数组处理
- 批准号:
1464381 - 财政年份:2015
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
相似海外基金
EAGER: A Genome Wide HDR Enhancement Screen in Maize
EAGER:玉米全基因组 HDR 增强屏幕
- 批准号:
2409037 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research: EAGER: IMPRESS-U: Groundwater Resilience Assessment through iNtegrated Data Exploration for Ukraine (GRANDE-U)
合作研究:EAGER:IMPRESS-U:通过乌克兰综合数据探索进行地下水恢复力评估 (GRANDE-U)
- 批准号:
2409395 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
EAGER: Integrating Pathological Image and Biomedical Text Data for Clinical Outcome Prediction
EAGER:整合病理图像和生物医学文本数据进行临床结果预测
- 批准号:
2412195 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
EAGER: Generalizing Monin-Obukhov Similarity Theory (MOST)-based Surface Layer Parameterizations for Turbulence Resolving Earth System Models (ESMs)
EAGER:将基于 Monin-Obukhov 相似理论 (MOST) 的表面层参数化推广到湍流解析地球系统模型 (ESM)
- 批准号:
2414424 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
EAGER: Creating a Composite EL Nino Record from the Lowland Neotropics
EAGER:创造低地新热带区综合厄尔尼诺记录
- 批准号:
2417794 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
EAGER/Collaborative Research: An LLM-Powered Framework for G-Code Comprehension and Retrieval
EAGER/协作研究:LLM 支持的 G 代码理解和检索框架
- 批准号:
2347624 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
EAGER: Innovation in Society Study Group
EAGER:社会创新研究小组
- 批准号:
2348836 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
EAGER: Artificial Intelligence to Understand Engineering Cultural Norms
EAGER:人工智能理解工程文化规范
- 批准号:
2342384 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
EAGER/Collaborative Research: Revealing the Physical Mechanisms Underlying the Extraordinary Stability of Flying Insects
EAGER/合作研究:揭示飞行昆虫非凡稳定性的物理机制
- 批准号:
2344215 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research: EAGER: Designing Nanomaterials to Reveal the Mechanism of Single Nanoparticle Photoemission Intermittency
合作研究:EAGER:设计纳米材料揭示单纳米粒子光电发射间歇性机制
- 批准号:
2345581 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant