EAGER: Using Machine Learning to Increase the Operational Efficiency of Large Distributed Systems
EAGER:利用机器学习提高大型分布式系统的运营效率
基本信息
- 批准号:1649087
- 负责人:
- 金额:$ 30万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2016
- 资助国家:美国
- 起止时间:2016-09-01 至 2019-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Large, distributed systems are nowadays ubiquitous and part of sustainable IT solutions to a broad range of customers and applications. Data centers in the private or public cloud and high performance computing systems are two examples of complex, highly distributed systems: the former are used by almost everyone on a daily basis, the latter are used by computational scientists for advancing science and engineering. High availability and reliability of these complex systems are important for the quality of user experience. Efficient management of such systems contributes to their availability and reliability, and relies on a priori knowledge of the timing of the collective demands of users and a priori knowledge of certain performance measures (e.g., usage, temperature, power) of various systems components.This project aims to provide a systematic methodology to improve the operational efficiency of complex, distributed systems by developing neural networks that can efficiently and accurately predict the incoming workload within fine and coarse time scales. Such workload prediction can dramatically improve the operational efficiency of data centers and high performance systems by driving proactive management strategies that specifically aim to enhance reliability. For datacenters, the focus is on actively reducing performance tickets that are automatically triggered by pro-actively managing virtual machine resizing and migration. For high performance computing systems the focus is on predicting hardware faults to autonomically improve the scheduler's efficiency, direct cooling, and improve performance and memory bandwidth.
如今,大型分布式系统无处不在,是面向广泛客户和应用的可持续IT解决方案的一部分。 私有云或公共云中的数据中心和高性能计算系统是复杂的高度分布式系统的两个例子:前者几乎每天都被每个人使用,后者被计算科学家用于推进科学和工程。 这些复杂系统的高可用性和可靠性对于用户体验的质量非常重要。 对这种系统的有效管理有助于提高其可用性和可靠性,并依赖于对用户集体需求的时间安排的先验知识和对某些业绩计量的先验知识(例如,该项目旨在提供一种系统方法,通过开发神经网络来提高复杂分布式系统的运营效率,该神经网络可以在精细和粗略的时间尺度内有效准确地预测传入的工作量。这种工作负载预测可以通过推动专门旨在提高可靠性的主动管理策略来显著提高数据中心和高性能系统的运营效率。 对于虚拟机管理者来说,重点是主动减少通过主动管理虚拟机迁移和迁移而自动触发的性能故障单。对于高性能计算系统,重点是预测硬件故障,以逐步提高调度器的效率,直接冷却,并提高性能和内存带宽。
项目成果
期刊论文数量(8)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Fault Site Pruning for Practical Reliability Analysis of GPGPU Applications
- DOI:10.1109/micro.2018.00066
- 发表时间:2018-10
- 期刊:
- 影响因子:0
- 作者:Bin Nie;Lishan Yang;Adwait Jog;E. Smirni
- 通讯作者:Bin Nie;Lishan Yang;Adwait Jog;E. Smirni
Efficient Deep Neural Network Serving: Fast and Furious
- DOI:10.1109/tnsm.2018.2808352
- 发表时间:2018-02
- 期刊:
- 影响因子:5.3
- 作者:Feng Yan;Yuxiong He;Olatunji Ruwase;E. Smirni
- 通讯作者:Feng Yan;Yuxiong He;Olatunji Ruwase;E. Smirni
How to Supercharge the Amazon T2: Observations and Suggestions
- DOI:10.1109/cloud.2017.43
- 发表时间:2017-06
- 期刊:
- 影响因子:0
- 作者:Feng Yan;Lihua Ren;Daniel J. Dubois;G. Casale;Jiawei Wen;E. Smirni
- 通讯作者:Feng Yan;Lihua Ren;Daniel J. Dubois;G. Casale;Jiawei Wen;E. Smirni
CEDULE: A Scheduling Framework for Burstable Performance in Cloud Computing
CEDULE:云计算中突发性能的调度框架
- DOI:10.1109/icac.2018.00024
- 发表时间:2018
- 期刊:
- 影响因子:0
- 作者:Ali, Ahsan;Pinciroli, Riccardo;Yan, Feng;Smirni, Evgenia
- 通讯作者:Smirni, Evgenia
Spatial–Temporal Prediction Models for Active Ticket Managing in Data Centers
- DOI:10.1109/tnsm.2018.2794409
- 发表时间:2018-01
- 期刊:
- 影响因子:5.3
- 作者:Ji Xue;R. Birke;L. Chen;E. Smirni
- 通讯作者:Ji Xue;R. Birke;L. Chen;E. Smirni
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Evgenia Smirni其他文献
A regression-based analytic model for capacity planning of multi-tier applications
- DOI:
10.1007/s10586-008-0052-0 - 发表时间:
2008-03-25 - 期刊:
- 影响因子:4.100
- 作者:
Qi Zhang;Ludmila Cherkasova;Ningfang Mi;Evgenia Smirni - 通讯作者:
Evgenia Smirni
Scheduling data analytics work with performance guarantees: queuing and machine learning models in synergy
- DOI:
10.1007/s10586-016-0563-z - 发表时间:
2016-04-23 - 期刊:
- 影响因子:4.100
- 作者:
Ji Xue;Feng Yan;Alma Riska;Evgenia Smirni - 通讯作者:
Evgenia Smirni
Evgenia Smirni的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Evgenia Smirni', 18)}}的其他基金
EAGER: Epidemic Spread Modeling Using Hard Data
EAGER:使用硬数据进行流行病传播建模
- 批准号:
2130681 - 财政年份:2021
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
BIGDATA: IA: Collaborative Research: Protecting Yourself from Wildfire Smoke: Big Data-Driven Adaptive Air Quality Prediction Methodologies
大数据:IA:协作研究:保护自己免受野火烟雾的侵害:大数据驱动的自适应空气质量预测方法
- 批准号:
1838022 - 财政年份:2019
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
SHF-Small: Robust Methodologies for Effective Data Center Management
SHF-Small:有效数据中心管理的稳健方法
- 批准号:
1218758 - 财政年份:2012
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
CPA-ACR-CSA: Effective Resource Allocation under Temporal Dependence
CPA-ACR-CSA:时间依赖性下的有效资源分配
- 批准号:
0811417 - 财政年份:2008
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
CSR-SMA: Autocorrelated Flows in Systems: Analytic Models and Applications
CSR-SMA:系统中的自相关流:分析模型和应用
- 批准号:
0720699 - 财政年份:2007
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
ITR-(ASE)-(dmc+int): Reconfigurable, Data-driven Resource Allocation in Complex Systems: Practice and Theoretical Foundations
ITR-(ASE)-(dmc int):复杂系统中可重构、数据驱动的资源分配:实践和理论基础
- 批准号:
0428330 - 财政年份:2004
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: Adaptive Data Parallel Storage
协作研究:自适应数据并行存储
- 批准号:
0090221 - 财政年份:2001
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Effective Techniques and Tools for Resource Management in Clustered Web Servers
集群Web服务器资源管理的有效技术和工具
- 批准号:
0098278 - 财政年份:2001
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Next Generation Software: Coordinated Allocation of Processor and I/O Resources in Parallel Systems
下一代软件:并行系统中处理器和 I/O 资源的协调分配
- 批准号:
9974992 - 财政年份:1999
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
相似国自然基金
Molecular Interaction Reconstruction of Rheumatoid Arthritis Therapies Using Clinical Data
- 批准号:31070748
- 批准年份:2010
- 资助金额:34.0 万元
- 项目类别:面上项目
相似海外基金
EAGER: North American Monsoon Prediction Using Causality Informed Machine Learning
EAGER:使用因果关系信息机器学习来预测北美季风
- 批准号:
2313689 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
EAGER: Using machine learning to develop a calibrated, remote sensing-based age model to improve late Quaternary slip-rate estimates in arid environments
EAGER:利用机器学习开发基于遥感的校准年龄模型,以改善干旱环境中第四纪晚期滑移率的估计
- 批准号:
2233310 - 财政年份:2022
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: EAGER: Generation of High Resolution Surface Melting Maps over Antarctica using Regional Climate Models, Remote Sensing and Machine Learning
合作研究:EAGER:利用区域气候模型、遥感和机器学习生成南极洲高分辨率表面融化地图
- 批准号:
2136938 - 财政年份:2022
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: EAGER: Generation of High Resolution Surface Melting Maps over Antarctica Using Regional Climate Models, Remote Sensing and Machine Learning
合作研究:EAGER:利用区域气候模型、遥感和机器学习生成南极洲高分辨率表面融化地图
- 批准号:
2136940 - 财政年份:2022
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
EAGER: Using machine learning to develop a calibrated, remote sensing-based age model to improve late Quaternary slip-rate estimates in arid environments
EAGER:利用机器学习开发基于遥感的校准年龄模型,以改善干旱环境中第四纪晚期滑移率的估计
- 批准号:
2210203 - 财政年份:2022
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: EAGER: Generation of High Resolution Surface Melting Maps over Antarctica Using Regional Climate Models, Remote Sensing and Machine Learning
合作研究:EAGER:利用区域气候模型、遥感和机器学习生成南极洲高分辨率表面融化地图
- 批准号:
2136939 - 财政年份:2022
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
EAGER: ADAPT: Hypotheses Generation in Heterogeneous Catalysis using Causal Inference and Machine Learning
EAGER:ADAPT:使用因果推理和机器学习在异质催化中生成假设
- 批准号:
2231174 - 财政年份:2022
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
EAGER: Collaborative Research: Understanding Human Behaviors and Mental Health using Federated Machine Learning on Smart Phones
EAGER:协作研究:使用智能手机上的联合机器学习了解人类行为和心理健康
- 批准号:
2041096 - 财政年份:2020
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
EAGER: Collaborative Research: Understanding Human Behaviors and Mental Health using Federated Machine Learning on Smart Phones
EAGER:协作研究:使用智能手机上的联合机器学习了解人类行为和心理健康
- 批准号:
2041065 - 财政年份:2020
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
EAGER: Run-Time Hardware-Assisted Malware Detection Using Machine Learning
EAGER:使用机器学习进行运行时硬件辅助恶意软件检测
- 批准号:
1936836 - 财政年份:2019
- 资助金额:
$ 30万 - 项目类别:
Standard Grant