CAREER: Capacity Planning Methodologies for Large Clusters with Heterogeneous Architectures and Diverse Applications
职业:异构架构和多样化应用的大型集群的容量规划方法
基本信息
- 批准号:1452751
- 负责人:
- 金额:$ 45.96万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2015
- 资助国家:美国
- 起止时间:2015-04-01 至 2022-03-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This project focuses on developing innovative techniques and algorithms to build adequate system models and support performance and reliability analysis in order to better explain large system behavior, predict application performance, and ensure high resource efficiency and system dependability. Large cluster environments are an important part of today's computing infrastructure, providing the platform for running applications that handle core business and operational data. However, with the complexity of computing and application infrastructure increasing and the requirements for high quality of service growing, large cluster environments are facing the difficult task of ensuring that applications are always available and delivering adequate performance. This project expects to achieve new capacity planning techniques for performance modeling, workload measurements and model parameterizations of large cluster systems. Intelligent capacity and reliability modeling will enable service providers to determine the best platform for their application before deploying and running the application. It will also enable system managers to optimize the performance, reliability and efficiency of the entire cluster infrastructure. This research will develop new performance modeling methods to capture the characteristics of heterogeneous hardware architectures and predict the behavior of an application running on an array of computing platforms. The research will extend performance modeling to failure awareness. The improved models will enable an accurate prediction of performance and reliability of a complex large-scale system by capturing the characteristics of both system workloads and failure events. In addition, the researchers will develop new advanced techniques to parameterize performance models with essential processing information of computational and communication components. These essential processing information do not only limit to mean values but also include other critical yet complicated features such as resource contention and burstiness symptoms. The project is involved with educational activities reaching out to students from secondary to graduate schools to aggressively motivate students, especially women, towards science and engineering integrating this research into curriculum development and undergraduate research activities.
该项目专注于开发创新的技术和算法,以构建足够的系统模型并支持性能和可靠性分析,以便更好地解释大型系统行为,预测应用性能,并确保高资源效率和系统可靠性。大型集群环境是当今计算基础设施的重要组成部分,为运行处理核心业务和运营数据的应用程序提供了平台。然而,随着计算和应用基础设施的复杂性增加以及对高服务质量的要求不断提高,大型集群环境面临着确保应用始终可用并提供足够性能的艰巨任务。该项目期望在大型集群系统的性能建模、工作负载测量和模型参数化方面实现新的容量规划技术。智能容量和可靠性建模将使服务提供商能够在部署和运行应用程序之前确定其应用程序的最佳平台。它还将使系统管理员能够优化整个集群基础设施的性能、可靠性和效率。这项研究将开发新的性能建模方法来捕获异类硬件体系结构的特征,并预测在一系列计算平台上运行的应用程序的行为。这项研究将把性能建模扩展到故障感知。改进的模型将通过捕捉系统工作负载和故障事件的特征,实现对复杂大型系统的性能和可靠性的准确预测。此外,研究人员将开发新的先进技术,用计算和通信组件的基本处理信息来参数化性能模型。这些重要的处理信息不仅限于平均值,还包括其他关键但复杂的特征,如资源争用和突发症状。该项目涉及对从中学到研究生院的学生的教育活动,以积极地激励学生,特别是妇女,将这项研究纳入课程编制和本科生研究活动。
项目成果
期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Performance and Consistency Analysis for Distributed Deep Learning Applications
- DOI:10.1109/ipccc50635.2020.9391566
- 发表时间:2020-11
- 期刊:
- 影响因子:0
- 作者:Danlin Jia;M. Saha;J. Bhimani;N. Mi
- 通讯作者:Danlin Jia;M. Saha;J. Bhimani;N. Mi
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Ningfang Mi其他文献
Load balancing for cluster systems under heavy-tailed and temporal dependent workloads
- DOI:
10.1016/j.simpat.2014.03.006 - 发表时间:
2014-05-01 - 期刊:
- 影响因子:
- 作者:
Jianzhe Tai;Zhen Li;Jiahui Chen;Ningfang Mi - 通讯作者:
Ningfang Mi
A regression-based analytic model for capacity planning of multi-tier applications
- DOI:
10.1007/s10586-008-0052-0 - 发表时间:
2008-03-25 - 期刊:
- 影响因子:4.100
- 作者:
Qi Zhang;Ludmila Cherkasova;Ningfang Mi;Evgenia Smirni - 通讯作者:
Evgenia Smirni
Performance impacts of autocorrelated flows in multi-tiered systems
- DOI:
10.1145/1328690.1328709 - 发表时间:
2007-12 - 期刊:
- 影响因子:0
- 作者:
Ningfang Mi - 通讯作者:
Ningfang Mi
Ningfang Mi的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Ningfang Mi', 18)}}的其他基金
Collaborative Research: CNS core: OAC core: Small: New Techniques for I/O Behavior Modeling and Persistent Storage Device Configuration
合作研究: CNS 核心:OAC 核心:小型:I/O 行为建模和持久存储设备配置新技术
- 批准号:
2008072 - 财政年份:2020
- 资助金额:
$ 45.96万 - 项目类别:
Standard Grant
CSR: EAGER: An Integrated Framework for Performance and Reliability in Large-scaled Computing Systems
CSR:EAGER:大规模计算系统性能和可靠性的集成框架
- 批准号:
1251129 - 财政年份:2012
- 资助金额:
$ 45.96万 - 项目类别:
Standard Grant
相似海外基金
Planning Grant: Developing capacity to attract diverse students to the geosciences: A public relations framework
规划补助金:培养吸引多元化学生学习地球科学的能力:公共关系框架
- 批准号:
2326816 - 财政年份:2024
- 资助金额:
$ 45.96万 - 项目类别:
Standard Grant
Planning: FIRE-PLAN: Building Wildland Fire Science Capacity in Alaska Through The University of Alaska Fairbanks Rural Campuses
规划:FIRE-PLAN:通过阿拉斯加大学费尔班克斯乡村校区建设阿拉斯加荒地火灾科学能力
- 批准号:
2333423 - 财政年份:2024
- 资助金额:
$ 45.96万 - 项目类别:
Standard Grant
Planning: Machine Learning in Transportation: Enhancing STEM Education and Research Capacity at The University of Texas at El Paso
规划:交通运输中的机器学习:增强德克萨斯大学埃尔帕索分校的 STEM 教育和研究能力
- 批准号:
2332774 - 财政年份:2023
- 资助金额:
$ 45.96万 - 项目类别:
Standard Grant
Changing primary care capacity in Canada (4C): A cross-provincial mixed methods study to inform workforce planning
改变加拿大的初级保健能力(4C):一项跨省混合方法研究,为劳动力规划提供信息
- 批准号:
488915 - 财政年份:2023
- 资助金额:
$ 45.96万 - 项目类别:
Operating Grants
Planning for Bioinformatics Capacity Building
生物信息学能力建设规划
- 批准号:
487795 - 财政年份:2023
- 资助金额:
$ 45.96万 - 项目类别:
Miscellaneous Programs
Planning: HBCU-UP: Strengthening Data Science Research Capacity and Education Programs through Academia-Industry Partnership
规划:HBCU-UP:通过学术界与工业界合作加强数据科学研究能力和教育计划
- 批准号:
2332161 - 财政年份:2023
- 资助金额:
$ 45.96万 - 项目类别:
Standard Grant
Collaborative Planning Grant: Building Capacity to Scale the Mentoring Math Scholars for Success Program
协作规划补助金:建设能力以扩大“指导数学学者取得成功”计划
- 批准号:
2221643 - 财政年份:2022
- 资助金额:
$ 45.96万 - 项目类别:
Standard Grant
Collaborative Research: NNA Research: Developing capacity for planning and adapting to riverbank erosion and its consequences in the Yukon River Basin
合作研究:NNA 研究:发展规划和适应育空河流域河岸侵蚀及其后果的能力
- 批准号:
2127443 - 财政年份:2022
- 资助金额:
$ 45.96万 - 项目类别:
Standard Grant
Planning Grant: Workshops to Build Capacity for Biological Field Research in Southern California Ecosystems
规划拨款:南加州生态系统生物实地研究能力建设研讨会
- 批准号:
2147764 - 财政年份:2022
- 资助金额:
$ 45.96万 - 项目类别:
Standard Grant
Planning: Capacity Building for a New RII Track-1 in Vermont
规划:佛蒙特州新 RII Track-1 的能力建设
- 批准号:
2210512 - 财政年份:2022
- 资助金额:
$ 45.96万 - 项目类别:
Standard Grant