Collaborative Research: OAC Core: Small: Anomaly Detection and Performance Optimization for End-to-End Data Transfers at Scale
协作研究:OAC 核心:小型:大规模端到端数据传输的异常检测和性能优化
基本信息
- 批准号:2412329
- 负责人:
- 金额:$ 27.5万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-10-01 至 2024-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Despite continuous efforts and investments to upgrade the networking infrastructure of research and education institutions to meet the needs of large-scale science applications, the data transfers on these networks often perform very poorly. Understanding the underlying reasons for poor transfer performance is important yet challenging due to the sophisticated design of today's cyberinfrastructures. This project offers a set of novel models and algorithms to detect and mitigate performance issues of data transfers in research networks. The proposed suite of tools helps researchers and system administrators to pinpoint the root cause of performance problems of data transfers so that necessary actions can be taken swiftly to minimize their impact on ongoing transfers. The project will also integrate the research into all levels of education, including science projects with K-12 students, development of new curriculum modules for graduate- and undergraduate-level courses, and summer workshops specifically for minority groups.Understanding the true underlying reasons for poor transfer performance is key to mitigating them and delivering the promised transfer speeds. However, the involvement of multiple end systems, dynamically changing background traffic, and the complexity of today's networking infrastructures turns it into a complicated and time-consuming process. This project develops a novel anomaly-detection and performance-optimization framework for end-to-end data transfers at scale. The framework helps to predict, understand, diagnose, and optimize wide-area file transfers in today's extreme-scale cyberinfrastructures. To achieve this goal, it derives deep-neural-network-based predictive models that can relate transfer settings to throughput. These models are then used to estimate the optimal configuration for new transfers. The framework also gathers performance metrics for end-system and network resources periodically to keep track of system utilization. When a transfer anomaly is detected, the collected metrics are fed into anomaly-classification models to identify the root causes. Once the underlying reasons of performance problems are identified, the framework launches a real-time optimization process to reconfigure the transfer settings such that the impact of anomalies can be alleviated.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
尽管不断努力和投资升级研究和教育机构的网络基础设施以满足大规模科学应用的需求,但这些网络上的数据传输通常表现非常差。由于当今网络基础设施的复杂设计,了解传输性能差的潜在原因很重要,但也很具有挑战性。该项目提供了一套新颖的模型和算法来检测和缓解研究网络中数据传输的性能问题。所建议的工具套件可帮助研究人员和系统管理员查明数据传输性能问题的根本原因,以便迅速采取必要的行动,将其对正在进行的传输的影响降至最低。该项目还将把研究纳入各级教育,包括与K-12学生的科学项目,为研究生和本科课程开发新课程模块,以及专门为少数群体开设的夏季讲习班。了解传输性能差的真正潜在原因是减轻它们并提供承诺的传输速度的关键。然而,多端系统的参与、动态变化的后台流量以及当今网络基础设施的复杂性使其成为一个复杂而耗时的过程。本项目为大规模端到端数据传输开发了一种新的异常检测和性能优化框架。该框架有助于预测、理解、诊断和优化当今极端规模网络基础设施中的广域文件传输。为了实现这一目标,它衍生了基于深度神经网络的预测模型,可以将传输设置与吞吐量联系起来。然后使用这些模型来估计新传输的最佳配置。该框架还定期收集终端系统和网络资源的性能指标,以跟踪系统利用率。当检测到传输异常时,将收集到的指标输入到异常分类模型中,以确定根本原因。一旦确定了性能问题的潜在原因,该框架就会启动实时优化过程,重新配置传输设置,从而减轻异常的影响。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Engin Arslan其他文献
Scattering analysis of ultrathin barrier (< 7 nm) GaN-based heterostructures
- DOI:
10.1007/s00339-019-2591-z - 发表时间:
2019-03-30 - 期刊:
- 影响因子:2.800
- 作者:
Polat Narin;Engin Arslan;Mehmet Ozturk;Mustafa Ozturk;Sefer Bora Lisesivdin;Ekmel Ozbay - 通讯作者:
Ekmel Ozbay
Demystifying the Performance of Data Transfers in High-Performance Research Networks
揭秘高性能研究网络中数据传输的性能
- DOI:
10.1109/e-science58273.2023.10254940 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Ehsan Saeedizade;Bing Zhang;Engin Arslan - 通讯作者:
Engin Arslan
HARP: Predictive Transfer Optimization Based on Historical Analysis and Real-Time Probing
HARP:基于历史分析和实时探测的预测传输优化
- DOI:
- 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
Engin Arslan;Kemal Guner;T. Kosar - 通讯作者:
T. Kosar
Network management game
网络管理游戏
- DOI:
10.1145/2427036.2427045 - 发表时间:
2011 - 期刊:
- 影响因子:0
- 作者:
Engin Arslan;M. Yuksel;M. H. Gunes - 通讯作者:
M. H. Gunes
Energy-performance trade-offs in data transfer tuning at the end-systems
终端系统数据传输调整中的能源性能权衡
- DOI:
10.1016/j.suscom.2014.08.004 - 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
I. Alan;Engin Arslan;T. Kosar - 通讯作者:
T. Kosar
Engin Arslan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Engin Arslan', 18)}}的其他基金
Elements: Adaptive End-to-End Parallelism for Distributed Science Workflows
要素:分布式科学工作流程的自适应端到端并行性
- 批准号:
2427408 - 财政年份:2024
- 资助金额:
$ 27.5万 - 项目类别:
Standard Grant
CAREER: Efficient and Reliable Data Transfer Services for Next Generation Research Networks
职业:为下一代研究网络提供高效可靠的数据传输服务
- 批准号:
2348281 - 财政年份:2023
- 资助金额:
$ 27.5万 - 项目类别:
Continuing Grant
Elements: Adaptive End-to-End Parallelism for Distributed Science Workflows
要素:分布式科学工作流程的自适应端到端并行性
- 批准号:
2209955 - 财政年份:2022
- 资助金额:
$ 27.5万 - 项目类别:
Standard Grant
CAREER: Efficient and Reliable Data Transfer Services for Next Generation Research Networks
职业:为下一代研究网络提供高效可靠的数据传输服务
- 批准号:
2145742 - 财政年份:2022
- 资助金额:
$ 27.5万 - 项目类别:
Continuing Grant
Collaborative Research: OAC Core: Small: Anomaly Detection and Performance Optimization for End-to-End Data Transfers at Scale
协作研究:OAC 核心:小型:大规模端到端数据传输的异常检测和性能优化
- 批准号:
2007789 - 财政年份:2020
- 资助金额:
$ 27.5万 - 项目类别:
Standard Grant
CRII: OAC: Online Optimization of End-to-End Data Transfers in High Performance Networks
CRII:OAC:高性能网络中端到端数据传输的在线优化
- 批准号:
1850353 - 财政年份:2019
- 资助金额:
$ 27.5万 - 项目类别:
Standard Grant
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
Cell Research
- 批准号:31224802
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research
- 批准号:31024804
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research (细胞研究)
- 批准号:30824808
- 批准年份:2008
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
- 批准号:10774081
- 批准年份:2007
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
- 批准号:
2403312 - 财政年份:2024
- 资助金额:
$ 27.5万 - 项目类别:
Standard Grant
Collaborative Research: OAC CORE: Federated-Learning-Driven Traffic Event Management for Intelligent Transportation Systems
合作研究:OAC CORE:智能交通系统的联邦学习驱动的交通事件管理
- 批准号:
2414474 - 财政年份:2024
- 资助金额:
$ 27.5万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Large-Scale Spatial Machine Learning for 3D Surface Topology in Hydrological Applications
合作研究:OAC 核心:水文应用中 3D 表面拓扑的大规模空间机器学习
- 批准号:
2414185 - 财政年份:2024
- 资助金额:
$ 27.5万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Learning AI Surrogate of Large-Scale Spatiotemporal Simulations for Coastal Circulation
合作研究:OAC Core:学习沿海环流大规模时空模拟的人工智能替代品
- 批准号:
2402947 - 财政年份:2024
- 资助金额:
$ 27.5万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
- 批准号:
2403313 - 财政年份:2024
- 资助金额:
$ 27.5万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Learning AI Surrogate of Large-Scale Spatiotemporal Simulations for Coastal Circulation
合作研究:OAC Core:学习沿海环流大规模时空模拟的人工智能替代品
- 批准号:
2402946 - 财政年份:2024
- 资助金额:
$ 27.5万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
- 批准号:
2403088 - 财政年份:2024
- 资助金额:
$ 27.5万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
- 批准号:
2403090 - 财政年份:2024
- 资助金额:
$ 27.5万 - 项目类别:
Standard Grant
Collaborative Research: OAC: Core: Harvesting Idle Resources Safely and Timely for Large-scale AI Applications in High-Performance Computing Systems
合作研究:OAC:核心:安全及时地收集闲置资源,用于高性能计算系统中的大规模人工智能应用
- 批准号:
2403399 - 财政年份:2024
- 资助金额:
$ 27.5万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
- 批准号:
2403089 - 财政年份:2024
- 资助金额:
$ 27.5万 - 项目类别:
Standard Grant