CSR: Medium: Pythia: An Application Analysis and Online Modeling Based Prediction Framework for Scalable Resource Management

CSR:中:Pythia:基于应用分析和在线建模的可扩展资源管理预测框架

基本信息

项目摘要

Computer applications that process large amounts of information are becoming common in a variety of science domains, such as High-Speed Physics, Economics, Genomics, Astronomy, and Meteorology. The overall goal of this project is to design software tools and technologies to support such applications efficiently on advanced computing systems. Moreover, the hardware that is used to implement such advanced systems often boasts of different types of resources, e.g., a conventional computer processor running alongside specialized graphic processing units, and this heterogeneity presents a major challenge when running the applications at the needed large scale. Having a better understanding of the applications behavior on the emerging hardware is key to sustaining these systems. To this end, the project designs and develops Pythia, software that models and predicts how applications would behave on given hardware. This information is then used to better utilize the resources, and achieve scalable and high performance computing systems.The intellectual value of this research involves three intermediate research goals. 1) Design an accurate application classifier using compile-time program analysis that captures workflow behavior and application characteristics, and provides detailed insights into expected runtime application interactions. 2) Design and develop an accurate simulation model that incorporates workflow and application characteristics into a heuristics engine to predict how the application will perform under given conditions and resources. 3) Design a distributed, flexible, efficient, and easy-to-use online oracle framework that captures the infrastructure heterogeneity and integrates with live systems to predict application behavior, which in turn can help guide application-attuned resource scheduling and management. Completion of the project will create tools and technologies for realization of more efficient and scalable computing systems. This work impacts a broad range of disciplines that regularly employ high-performance large-scale computing systems, especially for data-driven discovery. Consequently, use of Pythia will reduce the time-to-solution for modern and emerging applications, and therefore directly affect our way of life. The educational activities, which include recruiting and mentoring women and minority students, will help produce graduates with highly marketable skill sets. The integration of the research discoveries and software tools, which will be open source and made public, into the educational curriculum will help capture the interest of the next generation of computer scientists.
处理大量信息的计算机应用程序在各种科学领域中变得越来越普遍,例如高速物理学,经济学,基因组学,天文学和气象学。该项目的总体目标是设计软件工具和技术,以支持这些应用程序在先进的计算系统有效。此外,用于实现这种高级系统的硬件通常拥有不同类型的资源,例如,传统的计算机处理器与专用图形处理单元一起运行,并且当以所需的大规模运行应用程序时,这种异构性提出了主要的挑战。更好地理解新兴硬件上的应用程序行为是维持这些系统的关键。为此,该项目设计和开发了Pythia,这是一种软件,可以建模和预测应用程序在给定硬件上的行为。这些信息将被用来更好地利用资源,并实现可扩展的和高性能的计算系统。1)使用编译时程序分析设计一个精确的应用程序分类器,该分析捕获工作流行为和应用程序特征,并提供对预期运行时应用程序交互的详细见解。2)设计和开发一个精确的仿真模型,将工作流和应用程序特性整合到一个仿真引擎中,以预测应用程序在给定条件和资源下的性能。3)设计一个分布式的、灵活的、高效的、易于使用的在线Oracle框架,它可以捕获基础设施的异构性,并与实时系统集成以预测应用程序的行为,从而帮助指导与应用程序相协调的资源调度和管理。该项目的完成将为实现更高效和可扩展的计算系统创造工具和技术。这项工作影响了经常使用高性能大规模计算系统的广泛学科,特别是对于数据驱动的发现。因此,Pythia的使用将减少现代和新兴应用的解决方案的时间,从而直接影响我们的生活方式。教育活动包括招聘和指导妇女和少数民族学生,将有助于培养具有高度市场化技能的毕业生。将研究发现和软件工具(将开放源码并公开)纳入教育课程将有助于吸引下一代计算机科学家的兴趣。

项目成果

期刊论文数量(30)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
iez: Resource Contention Aware Load Balancing for Large-Scale Parallel File Systems
Toward Transparent Data Management in Multi-Layer Storage Hierarchy of HPC Systems
Toward scalable monitoring on large-scale storage for software defined cyberinfrastructure
CSLIM: Automated Extraction of IoT Functionalities from Legacy C Codebases
CSLIM:从遗留 C 代码库中自动提取 IoT 功能
Improving I/O Performance of HPC Application Using Intra-Job Scheduling.
使用作业内调度提高 HPC 应用程序的 I/O 性能。
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Ali Butt其他文献

Ali Butt的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Ali Butt', 18)}}的其他基金

Collaborative Research: CNS Core: Medium:HardLambda: A new FaaS Abstraction for Cross-Stack Resource Management in Disaggregated Datacenters
协作研究:CNS 核心:Medium:HardLambda:分解数据中心跨堆栈资源管理的新 FaaS 抽象
  • 批准号:
    2106634
  • 财政年份:
    2021
  • 资助金额:
    $ 75万
  • 项目类别:
    Standard Grant
SPX: Collaborative Research: Cross-stack Memory Optimizations for Boosting I/O Performance of Deep Learning HPC Applications
SPX:协作研究:用于提升深度学习 HPC 应用程序 I/O 性能的跨堆栈内存优化
  • 批准号:
    1919113
  • 财政年份:
    2019
  • 资助金额:
    $ 75万
  • 项目类别:
    Standard Grant
Workshop on Data Storage Research Vision
数据存储研究愿景研讨会
  • 批准号:
    1829096
  • 财政年份:
    2018
  • 资助金额:
    $ 75万
  • 项目类别:
    Standard Grant
CSR: Small: Collaborative Research: Scalable Fine-Grained Cloud Monitoring for Empowering IoT
CSR:小型:协作研究:支持物联网的可扩展细粒度云监控
  • 批准号:
    1615411
  • 财政年份:
    2016
  • 资助金额:
    $ 75万
  • 项目类别:
    Standard Grant
Student Travel Support for IEEE 23rd International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS 2015)
IEEE 第 23 届计算机和电信系统建模、分析和仿真国际研讨会 (MASCOTS 2015) 学生旅行支持
  • 批准号:
    1541504
  • 财政年份:
    2015
  • 资助金额:
    $ 75万
  • 项目类别:
    Standard Grant
DC: Small: Collaborative Research: Exploring Energy-Reliability Trade-offs in Data Storage Systems
DC:小型:协作研究:探索数据存储系统中的能源可靠性权衡
  • 批准号:
    1016408
  • 财政年份:
    2010
  • 资助金额:
    $ 75万
  • 项目类别:
    Standard Grant
Increasing Student Participation in Cluster Computing through IEEE Cluster 2010 Attendance
通过出席 IEEE Cluster 2010 提高学生对集群计算的参与
  • 批准号:
    1049858
  • 财政年份:
    2010
  • 资助金额:
    $ 75万
  • 项目类别:
    Standard Grant
CSR: Small: Towards Realizing Cloud HPC: An Adaptive Programming Model for Accelerator-based Clusters
CSR:小:迈向实现云 HPC:基于加速器的集群的自适应编程模型
  • 批准号:
    1016793
  • 财政年份:
    2010
  • 资助金额:
    $ 75万
  • 项目类别:
    Standard Grant
U.S. - Pakistan International Planning Visit: Economical Computing Substrate for Developing Regions
美国-巴基斯坦国际规划访问:发展中地区的经济计算基板
  • 批准号:
    0940048
  • 财政年份:
    2009
  • 资助金额:
    $ 75万
  • 项目类别:
    Standard Grant
CAREER: A Scalable Hierarchical Framework for High-Performance Data Storage
职业:高性能数据存储的可扩展分层框架
  • 批准号:
    0746832
  • 财政年份:
    2008
  • 资助金额:
    $ 75万
  • 项目类别:
    Continuing Grant

相似海外基金

Collaborative Research: CyberTraining: Implementation: Medium: Training Users, Developers, and Instructors at the Chemistry/Physics/Materials Science Interface
协作研究:网络培训:实施:媒介:在化学/物理/材料科学界面培训用户、开发人员和讲师
  • 批准号:
    2321102
  • 财政年份:
    2024
  • 资助金额:
    $ 75万
  • 项目类别:
    Standard Grant
RII Track-4:@NASA: Bluer and Hotter: From Ultraviolet to X-ray Diagnostics of the Circumgalactic Medium
RII Track-4:@NASA:更蓝更热:从紫外到 X 射线对环绕银河系介质的诊断
  • 批准号:
    2327438
  • 财政年份:
    2024
  • 资助金额:
    $ 75万
  • 项目类别:
    Standard Grant
Collaborative Research: Topological Defects and Dynamic Motion of Symmetry-breaking Tadpole Particles in Liquid Crystal Medium
合作研究:液晶介质中对称破缺蝌蚪粒子的拓扑缺陷与动态运动
  • 批准号:
    2344489
  • 财政年份:
    2024
  • 资助金额:
    $ 75万
  • 项目类别:
    Standard Grant
Collaborative Research: AF: Medium: The Communication Cost of Distributed Computation
合作研究:AF:媒介:分布式计算的通信成本
  • 批准号:
    2402836
  • 财政年份:
    2024
  • 资助金额:
    $ 75万
  • 项目类别:
    Continuing Grant
Collaborative Research: AF: Medium: Foundations of Oblivious Reconfigurable Networks
合作研究:AF:媒介:遗忘可重构网络的基础
  • 批准号:
    2402851
  • 财政年份:
    2024
  • 资助金额:
    $ 75万
  • 项目类别:
    Continuing Grant
Collaborative Research: CIF: Medium: Snapshot Computational Imaging with Metaoptics
合作研究:CIF:Medium:Metaoptics 快照计算成像
  • 批准号:
    2403122
  • 财政年份:
    2024
  • 资助金额:
    $ 75万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Differentiable Hardware Synthesis
合作研究:SHF:媒介:可微分硬件合成
  • 批准号:
    2403134
  • 财政年份:
    2024
  • 资助金额:
    $ 75万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Enabling Graphics Processing Unit Performance Simulation for Large-Scale Workloads with Lightweight Simulation Methods
合作研究:SHF:中:通过轻量级仿真方法实现大规模工作负载的图形处理单元性能仿真
  • 批准号:
    2402804
  • 财政年份:
    2024
  • 资助金额:
    $ 75万
  • 项目类别:
    Standard Grant
Collaborative Research: CIF-Medium: Privacy-preserving Machine Learning on Graphs
合作研究:CIF-Medium:图上的隐私保护机器学习
  • 批准号:
    2402815
  • 财政年份:
    2024
  • 资助金额:
    $ 75万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Tiny Chiplets for Big AI: A Reconfigurable-On-Package System
合作研究:SHF:中:用于大人工智能的微型芯片:可重新配置的封装系统
  • 批准号:
    2403408
  • 财政年份:
    2024
  • 资助金额:
    $ 75万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了