CDS&E: HAM3R: Heterogeneous Automated Management of Multiscale Methods and Resources
CDS
基本信息
- 批准号:2204011
- 负责人:
- 金额:$ 49.99万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-08-01 至 2025-07-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
To run large and detailed simulations, scientists need to resolve physics at different time scales. Separating each scale into individual simulations and coupling them together greatly lowers the execution time. The process of coupling the simulation is laborious, complex, and error-prone. Achieving performance and energy efficiency requires domain scientists to have deep knowledge of computer architectures and systems, and extensively tune the algorithms and resource requests. The current state-of-the-practice of manual coupling and optimization leads to duplication of programming efforts and solutions that are not portable to new problems and sizes, and to computer architectures and systems. This interdisciplinary project builds technologies, namely HAM3R (Heterogeneous Automated Management of Multiscale Methods and Resources), to automate the coupling and optimize the computing and energy efficiency for multiscale simulations. HAM3R is broadly applicable to multiscale problems in computational chemistry, physics, biology, and materials science. Greater simplicity and flexibility of simulation codes have broad impacts on computational science by reducing the entry barrier to domain scientists. Enabling domain researchers to leverage advanced cyberinfrastructure will accelerate scientific throughput, which can have transformative effects across a spectrum of disciplines. This project broadens the engagement of students and underrepresented groups and user communities. This project develops a versatile software framework (named HAM3R) for predictive resource, workload, fault, and power management that enables automated optimization of performance, scalability, and energy efficiency of multiscale models on heterogeneous HPC systems. HAM3R is a transformative software framework that removes the complexities of coupling multiscale simulations from domain scientists by enabling dynamic coupling of multiscale models that combines an API, library, and runtime to support broad coupling styles and domains. The analysis of computation and data bottlenecks yields enhanced analytical and machine-learned performance models. HAM3R will equip multiscale models with automated resource allocation -- including predictive load-balancing by proactive management of computation, communication, and data movement -- to enable scalability and efficient simulations on heterogeneous HPC systems. This project’s Intellectual Merit advances the following areas: (1) data-centric optimizations to reduce the cost of data motion intra- and inter-scale by migrating computation and using lossy data compression; (2) customized local recovery for process failures; (3) model-predictive load balancing schemes within and across scales that support intelligent resource management and dynamic adaption; and (4) advanced power management across heterogeneous devices to ensure energy-efficient execution. The project demonstrates the capabilities of HAM3R in two different popular multiscale modeling frameworks: coupled molecular dynamics and lattice Boltzmann method simulations via domain decomposition; coupled dissipative particle dynamics and finite element method simulations via heterogeneous multiscale methods.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
为了进行大规模和详细的模拟,科学家需要在不同的时间尺度上解决物理问题。将每个比例划分为单独的模拟,并将它们耦合在一起,极大地减少了执行时间。耦合模拟的过程既费力又复杂,而且容易出错。要实现性能和能源效率,领域科学家需要对计算机体系结构和系统有深入的了解,并广泛调整算法和资源请求。目前手动耦合和优化的实践状态导致编程工作和解决方案的重复,不能移植到新的问题和大小,以及计算机体系结构和系统。这个跨学科项目构建了多尺度方法和资源的异质自动化管理(HAM3R)技术,以自动化多尺度模拟的耦合并优化计算和能源效率。HAM3R广泛适用于计算化学、物理、生物和材料科学中的多尺度问题。模拟代码的更大的简单性和灵活性通过降低领域科学家的进入门槛,对计算科学产生了广泛的影响。使领域研究人员能够利用先进的网络基础设施将加快科学吞吐量,这可能会在一系列学科中产生革命性的影响。该项目扩大了学生、代表性不足的群体和用户社区的参与。该项目开发了一个通用的软件框架(名为HAM3R),用于预测资源、工作负载、故障和电源管理,支持在异类HPC系统上自动优化多比例模型的性能、可扩展性和能效。HAM3R是一个变革性的软件框架,它通过启用多尺度模型的动态耦合来消除领域科学家耦合多尺度模拟的复杂性,多尺度模型结合了API、库和运行时以支持广泛的耦合样式和域。对计算和数据瓶颈的分析产生了改进的分析和机器学习的性能模型。HAM3R将为多尺度模型配备自动资源分配--包括通过主动管理计算、通信和数据移动实现预测性负载平衡--以在不同的HPC系统上实现可伸缩性和高效模拟。该项目的智能优点包括以下几个方面:(1)以数据为中心的优化,通过迁移计算和使用有损数据压缩来降低规模内和规模间的数据移动成本;(2)针对流程故障的定制本地恢复;(3)规模内和跨规模的模型预测性负载平衡方案,支持智能资源管理和动态适应;以及(4)跨不同设备的高级电源管理,以确保高能效执行。该项目展示了HAM3R在两个不同流行的多尺度建模框架中的能力:通过区域分解的耦合分子动力学和格子Boltzmann方法模拟;通过异质多尺度方法的耦合耗散粒子动力学和有限元方法模拟。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Implementation of a ternary lattice Boltzmann model in LAMMPS
LAMMPS 中三元格子玻尔兹曼模型的实现
- DOI:10.1016/j.cpc.2023.108898
- 发表时间:2023
- 期刊:
- 影响因子:6.3
- 作者:Arumugam Kumar, Gokul Raman;Andrews, James P.;Schiller, Ulf D.
- 通讯作者:Schiller, Ulf D.
Deep neural operator for learning transient response of interpenetrating phase composites subject to dynamic loading
- DOI:10.1007/s00466-023-02343-6
- 发表时间:2023-03
- 期刊:
- 影响因子:4.1
- 作者:Minglei Lu;Ali Mohammadi;Zhaoxu Meng;Xuhui Meng;Gang Li;Zhen Li
- 通讯作者:Minglei Lu;Ali Mohammadi;Zhaoxu Meng;Xuhui Meng;Gang Li;Zhen Li
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Jon Calhoun其他文献
Recovering Detectable Uncorrectable Errors via Spatial Data Prediction
通过空间数据预测恢复可检测的不可纠正的错误
- DOI:
10.1145/3624062.3624120 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Kristen Guernsey;Sarah Placke;Alexandra Poulos;Jon Calhoun - 通讯作者:
Jon Calhoun
Multifacets of lossy compression for scientific data in the Joint-Laboratory of Extreme Scale Computing
超大规模计算联合实验室科学数据有损压缩的多方面
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Franck Cappello;Sheng Di;Robert Underwood;Dingwen Tao;Jon Calhoun;Yoshii Kazutomo;Kento Sato;Amarjit Singh;Luc Giraud;Emmanuel Agullo;Xavier Yepes;Mario Acosta;Sian Jin;Jiannan Tian;Frédéric Vivien;Bo Zhang;Kentaro Sano;Tomohiro Ueno;Thomas Grützmacher;H. Anzt - 通讯作者:
H. Anzt
Evaluating the Resiliency of Posits for Scientific Computing
评估科学计算假设的弹性
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Benjamin Schlueter;Jon Calhoun;Alexandra Poulos - 通讯作者:
Alexandra Poulos
Lossy and Lossless Compression for BioFilm Optical Coherence Tomography (OCT)
生物膜光学相干断层扫描 (OCT) 的有损和无损压缩
- DOI:
10.1145/3624062.3625125 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
M. Faykus;Jon Calhoun;Melissa C. Smith - 通讯作者:
Melissa C. Smith
Jon Calhoun的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Jon Calhoun', 18)}}的其他基金
CAREER: Dynamic Management of Compressed Arrays for High-Performance Computing Applications
职业:高性能计算应用的压缩阵列的动态管理
- 批准号:
1943114 - 财政年份:2020
- 资助金额:
$ 49.99万 - 项目类别:
Continuing Grant
SHF: Small: Using Error-Bounded Lossy Compression to Improve High-Performance Computing Systems and Applications
SHF:小型:使用误差有限有损压缩来改进高性能计算系统和应用程序
- 批准号:
1910197 - 财政年份:2019
- 资助金额:
$ 49.99万 - 项目类别:
Standard Grant