GOALI: Frameworks: At-Scale Heterogeneous Data based Adaptive Development Platform for Machine-Learning Models for Material and Chemical Discovery
GOALI:框架:基于大规模异构数据的自适应开发平台,用于材料和化学发现的机器学习模型
基本信息
- 批准号:2311632
- 负责人:
- 金额:$ 450万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-10-01 至 2028-09-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
This project seeks to establish a new technological paradigm and the software infrastructure necessary for the development of Machine Learning (ML) models capable of predicting the properties of unseen molecular and materials systems/structures, thus enabling modeling of atomic behavior and the computational discovery of new molecules and materials at significantly higher throughput than afforded by existing first principles (quantum) methods. ML-enabled materials discovery is poised to play a critical role in addressing modern societal challenges such as energy sustainability and, as such, the technology and infrastructure developed by this project are expected to have a transformative impact across many scientific and engineering domains. The platform facilitates access, sharing, and discovery of vast amounts of first principles and experimental data, removing inefficiencies and accelerating scientific discovery by enabling the development of ML models on a scale previously inaccessible. To achieve these goals, this project is carried out in partnership with Amazon Web Services (AWS), providing the necessary know-how for the development of specialized open-source tools for training ML models at scale. This project is committed to the advancement of diversity, equity and inclusiveness in higher education, and as such it incorporates a variety of mechanisms to include underrepresented and low-income students (high-school and undergraduate) in its research activities across the four participating universities (New York University, University of Minnesota, University of Florida, and Brigham Young University), in addition to the mentoring of graduate students, the development of teaching materials, and workshops aimed at industrial outreach and training. To assure alignment between the platform/software and community needs, this project is supported by an Advisory Board of experts in cyberinfrastructure development, machine learning, material and chemical sciences, and STEM outreach who evaluate and provide strategic advice to the PIs.The key technological advance that serves as the basis of this work are "foundation models", an approach for building ML systems in which a model trained on extremely large amounts of diverse and easily available data can be adapted to diverse applications with a small amount of additional model fitting (fine-tuning). This project thus focuses on the development of a foundation model, called FERMat, for molecular and material property prediction, and ML interatomic potentials for modeling atomic behavior. FERMat is to be delivered via an integrated adaptive platform in the form of a software package and an online framework for developing and deploying specialized ML models for materials and chemistry applications, called "FERMat Apps". In collaboration with AWS this project seeks to develop open-source software for training foundation models like FERMat at scale on large amounts of highly heterogeneous and multi-modal data. The high data needs will be met by leveraging and significantly expanding the ColabFit Exchange, an online repository of first principles and experimental data optimized for training of ML models, in cooperation with a large number of materials and molecular data repositories, standards organizations, and existing cyberinfrastructures. FERMat and any ML model derived from it is designed to support uncertainty quantification (based on information geometry, Bayesian, and frequentist approaches) to ensure the robustness of predictions. As guiding target applications, this project considers two problems of scientific interest: 2D material driven catalysis and the prediction of molecular crystal polymorphs.This award by the Office of Advanced Cyberinfrastructure is jointly supported by the Division of Materials Research within the Directorate for Mathematical and Physical Sciences.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该项目旨在建立一种新的技术范式和开发机器学习(ML)模型所需的软件基础设施,这些模型能够预测看不见的分子和材料系统/结构的特性,从而能够以比现有的第一原理(量子)方法更高的吞吐量对原子行为进行建模,并对新分子和材料进行计算发现。ML使能材料的发现准备在解决现代社会的挑战,如能源可持续性方面发挥关键作用,因此,该项目开发的技术和基础设施预计将在许多科学和工程领域产生变革性影响。该平台促进了对大量第一原理和实验数据的访问、共享和发现,消除了效率低下的现象,并通过在以前无法访问的规模上开发ML模型来加速科学发现。为了实现这些目标,该项目与Amazon Web Services(AWS)合作开展,为开发用于大规模训练ML模型的专用开源工具提供必要的专业知识。该项目致力于促进高等教育的多样性、公平性和包容性,因此它纳入了各种机制,以纳入代表性不足和低收入学生(高中和本科)在其研究活动在四个参与大学(纽约大学、明尼苏达大学、佛罗里达大学和杨百翰大学),除了对研究生的指导外,编写教材,举办讲习班,以进行工业推广和培训。为了确保平台/软件与社区需求之间的一致性,该项目由网络基础设施开发、机器学习、材料和化学科学以及STEM外展专家组成的咨询委员会提供支持,该委员会负责评估PI并为PI提供战略建议。作为这项工作基础的关键技术进步是“基础模型”,一种用于构建ML系统的方法,在该方法中,在极其大量的不同且易于获得的数据上训练的模型可以通过少量的额外模型拟合(微调)来适应不同的应用。因此,该项目的重点是开发一个基础模型,称为FERMat,用于分子和材料性质预测,以及用于模拟原子行为的ML原子间势。FERMat将通过一个集成的自适应平台以软件包和在线框架的形式提供,用于开发和部署用于材料和化学应用的专用ML模型,称为“FERMat应用程序”。该项目与AWS合作,旨在开发开源软件,用于在大量高度异构和多模态数据上大规模训练FERMat等基础模型。高数据需求将通过利用和显着扩展ColabFit Exchange来满足,ColabFit Exchange是一个为ML模型培训而优化的第一原理和实验数据的在线存储库,与大量材料和分子数据存储库,标准组织和现有的网络基础设施合作。FERMat及其衍生的任何ML模型旨在支持不确定性量化(基于信息几何,贝叶斯和频率论方法),以确保预测的鲁棒性。作为指导目标应用,该项目考虑了两个具有科学意义的问题:2D材料驱动催化和分子晶体多晶型物的预测。高级网络基础设施办公室的这一奖项由数学和物理科学理事会材料研究部共同支持。该奖项反映了NSF的法定使命,并通过使用基金会的学术价值和更广泛的影响审查标准。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Stefano Martiniani其他文献
Transport and Energetics of Bacterial Rectification
细菌整流的运输和能量学
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Satyam Anand;Xiaolei Ma;Shuo Guo;Stefano Martiniani;Xiang Cheng - 通讯作者:
Xiang Cheng
Monte Carlo sampling for stochastic weight functions
随机权重函数的蒙特卡罗采样
- DOI:
10.1073/pnas.1620497114 - 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
D. Frenkel;K. J. Schrenk;Stefano Martiniani - 通讯作者:
Stefano Martiniani
On the complexity of energy landscapes: algorithms and a direct test of the Edwards conjecture
关于能源景观的复杂性:算法和爱德华兹猜想的直接检验
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Stefano Martiniani - 通讯作者:
Stefano Martiniani
Structural analysis of high-dimensional basins of attraction.
高维吸引力盆地的结构分析。
- DOI:
10.1103/physreve.94.031301 - 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
Stefano Martiniani;K. J. Schrenk;J. Stevenson;D. Wales;D. Frenkel - 通讯作者:
D. Frenkel
Vicsek model by time-interlaced compression: A dynamical computable information density.
时间交错压缩的 Vicsek 模型:动态可计算信息密度。
- DOI:
10.1103/physreve.103.062141 - 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
A. Cavagna;P. Chaikin;D. Levine;Stefano Martiniani;A. Puglisi;M. Viale - 通讯作者:
M. Viale
Stefano Martiniani的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Stefano Martiniani', 18)}}的其他基金
EAGER: Quantifying the error landscape of deep neural networks
EAGER:量化深度神经网络的错误情况
- 批准号:
2226387 - 财政年份:2022
- 资助金额:
$ 450万 - 项目类别:
Standard Grant
EAGER: Quantifying the error landscape of deep neural networks
EAGER:量化深度神经网络的错误情况
- 批准号:
2132995 - 财政年份:2021
- 资助金额:
$ 450万 - 项目类别:
Standard Grant
相似海外基金
CAREER: Novel Parallelization Frameworks for Large-Scale Network Optimization with Combinatorial Requirements: Solution Methods and Applications
职业:具有组合要求的大规模网络优化的新型并行化框架:解决方法和应用
- 批准号:
2338641 - 财政年份:2024
- 资助金额:
$ 450万 - 项目类别:
Standard Grant
Frameworks: arXiv as an accessible large-scale open research platform
框架:arXiv 作为一个可访问的大型开放研究平台
- 批准号:
2311521 - 财政年份:2024
- 资助金额:
$ 450万 - 项目类别:
Standard Grant
Collaborative Research: Frameworks: Scalable Performance and Accuracy analysis for Distributed and Extreme-scale systems (SPADE)
协作研究:框架:分布式和超大规模系统的可扩展性能和准确性分析 (SPADE)
- 批准号:
2311707 - 财政年份:2023
- 资助金额:
$ 450万 - 项目类别:
Standard Grant
Collaborative Research: Frameworks: Scalable Performance and Accuracy analysis for Distributed and Extreme-scale systems (SPADE)
协作研究:框架:分布式和超大规模系统的可扩展性能和准确性分析 (SPADE)
- 批准号:
2311708 - 财政年份:2023
- 资助金额:
$ 450万 - 项目类别:
Standard Grant
Multiscale computational frameworks for integrating large-scale cortical dynamics, connectivity, and behavior
用于集成大规模皮层动力学、连接性和行为的多尺度计算框架
- 批准号:
10840682 - 财政年份:2023
- 资助金额:
$ 450万 - 项目类别:
Collaborative Research: Frameworks: Scalable Performance and Accuracy analysis for Distributed and Extreme-scale systems (SPADE)
协作研究:框架:分布式和超大规模系统的可扩展性能和准确性分析 (SPADE)
- 批准号:
2311709 - 财政年份:2023
- 资助金额:
$ 450万 - 项目类别:
Standard Grant
Extension of Innovative Frameworks for Application Analysis in Post-Peta Scale Systems
后 Peta 规模系统应用分析创新框架的扩展
- 批准号:
22KK0182 - 财政年份:2023
- 资助金额:
$ 450万 - 项目类别:
Fund for the Promotion of Joint International Research (Fostering Joint International Research (A))
Frameworks: Large Scale Atmospheric Research Using an Integrated WRF Modeling, Visualization, and Verification Container Framework (I-WRF)
框架:使用集成 WRF 建模、可视化和验证容器框架 (I-WRF) 进行大规模大气研究
- 批准号:
2209711 - 财政年份:2022
- 资助金额:
$ 450万 - 项目类别:
Standard Grant
Multiscale computational frameworks for integrating large-scale cortical dynamics, connectivity, and behavior
用于集成大规模皮层动力学、连接性和行为的多尺度计算框架
- 批准号:
10263628 - 财政年份:2021
- 资助金额:
$ 450万 - 项目类别:
Novel Decomposition Techniques Enabling Scalable Computational Frameworks for Large-Scale Nonlinear Optimization Problems
新颖的分解技术为大规模非线性优化问题提供可扩展的计算框架
- 批准号:
2012410 - 财政年份:2020
- 资助金额:
$ 450万 - 项目类别:
Standard Grant