权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

GOALI: Frameworks: At-Scale Heterogeneous Data based Adaptive Development Platform for Machine-Learning Models for Material and Chemical Discovery

GOALI：框架：基于大规模异构数据的自适应开发平台，用于材料和化学发现的机器学习模型

基本信息

批准号：
2311632
负责人：
Stefano Martiniani
金额：
$ 450万
依托单位：
New York University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-10-01 至 2028-09-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2311632&HistoricalAwards=false
关键词：
GOALI Frameworks Scale Heterogeneous Data

项目摘要

This project seeks to establish a new technological paradigm and the software infrastructure necessary for the development of Machine Learning (ML) models capable of predicting the properties of unseen molecular and materials systems/structures, thus enabling modeling of atomic behavior and the computational discovery of new molecules and materials at significantly higher throughput than afforded by existing first principles (quantum) methods. ML-enabled materials discovery is poised to play a critical role in addressing modern societal challenges such as energy sustainability and, as such, the technology and infrastructure developed by this project are expected to have a transformative impact across many scientific and engineering domains. The platform facilitates access, sharing, and discovery of vast amounts of first principles and experimental data, removing inefficiencies and accelerating scientific discovery by enabling the development of ML models on a scale previously inaccessible. To achieve these goals, this project is carried out in partnership with Amazon Web Services (AWS), providing the necessary know-how for the development of specialized open-source tools for training ML models at scale. This project is committed to the advancement of diversity, equity and inclusiveness in higher education, and as such it incorporates a variety of mechanisms to include underrepresented and low-income students (high-school and undergraduate) in its research activities across the four participating universities (New York University, University of Minnesota, University of Florida, and Brigham Young University), in addition to the mentoring of graduate students, the development of teaching materials, and workshops aimed at industrial outreach and training. To assure alignment between the platform/software and community needs, this project is supported by an Advisory Board of experts in cyberinfrastructure development, machine learning, material and chemical sciences, and STEM outreach who evaluate and provide strategic advice to the PIs.The key technological advance that serves as the basis of this work are "foundation models", an approach for building ML systems in which a model trained on extremely large amounts of diverse and easily available data can be adapted to diverse applications with a small amount of additional model fitting (fine-tuning). This project thus focuses on the development of a foundation model, called FERMat, for molecular and material property prediction, and ML interatomic potentials for modeling atomic behavior. FERMat is to be delivered via an integrated adaptive platform in the form of a software package and an online framework for developing and deploying specialized ML models for materials and chemistry applications, called "FERMat Apps". In collaboration with AWS this project seeks to develop open-source software for training foundation models like FERMat at scale on large amounts of highly heterogeneous and multi-modal data. The high data needs will be met by leveraging and significantly expanding the ColabFit Exchange, an online repository of first principles and experimental data optimized for training of ML models, in cooperation with a large number of materials and molecular data repositories, standards organizations, and existing cyberinfrastructures. FERMat and any ML model derived from it is designed to support uncertainty quantification (based on information geometry, Bayesian, and frequentist approaches) to ensure the robustness of predictions. As guiding target applications, this project considers two problems of scientific interest: 2D material driven catalysis and the prediction of molecular crystal polymorphs.This award by the Office of Advanced Cyberinfrastructure is jointly supported by the Division of Materials Research within the Directorate for Mathematical and Physical Sciences.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

该项目旨在建立一种新的技术范式和开发机器学习（ML）模型所需的软件基础设施，这些模型能够预测看不见的分子和材料系统/结构的特性，从而能够以比现有的第一原理（量子）方法更高的吞吐量对原子行为进行建模，并对新分子和材料进行计算发现。ML使能材料的发现准备在解决现代社会的挑战，如能源可持续性方面发挥关键作用，因此，该项目开发的技术和基础设施预计将在许多科学和工程领域产生变革性影响。该平台促进了对大量第一原理和实验数据的访问、共享和发现，消除了效率低下的现象，并通过在以前无法访问的规模上开发ML模型来加速科学发现。为了实现这些目标，该项目与Amazon Web Services（AWS）合作开展，为开发用于大规模训练ML模型的专用开源工具提供必要的专业知识。该项目致力于促进高等教育的多样性、公平性和包容性，因此它纳入了各种机制，以纳入代表性不足和低收入学生（高中和本科）在其研究活动在四个参与大学（纽约大学、明尼苏达大学、佛罗里达大学和杨百翰大学），除了对研究生的指导外，编写教材，举办讲习班，以进行工业推广和培训。为了确保平台/软件与社区需求之间的一致性，该项目由网络基础设施开发、机器学习、材料和化学科学以及STEM外展专家组成的咨询委员会提供支持，该委员会负责评估PI并为PI提供战略建议。作为这项工作基础的关键技术进步是“基础模型”，一种用于构建ML系统的方法，在该方法中，在极其大量的不同且易于获得的数据上训练的模型可以通过少量的额外模型拟合（微调）来适应不同的应用。因此，该项目的重点是开发一个基础模型，称为FERMat，用于分子和材料性质预测，以及用于模拟原子行为的ML原子间势。FERMat将通过一个集成的自适应平台以软件包和在线框架的形式提供，用于开发和部署用于材料和化学应用的专用ML模型，称为“FERMat应用程序”。该项目与AWS合作，旨在开发开源软件，用于在大量高度异构和多模态数据上大规模训练FERMat等基础模型。高数据需求将通过利用和显着扩展ColabFit Exchange来满足，ColabFit Exchange是一个为ML模型培训而优化的第一原理和实验数据的在线存储库，与大量材料和分子数据存储库，标准组织和现有的网络基础设施合作。FERMat及其衍生的任何ML模型旨在支持不确定性量化（基于信息几何，贝叶斯和频率论方法），以确保预测的鲁棒性。作为指导目标应用，该项目考虑了两个具有科学意义的问题：2D材料驱动催化和分子晶体多晶型物的预测。高级网络基础设施办公室的这一奖项由数学和物理科学理事会材料研究部共同支持。该奖项反映了NSF的法定使命，并通过使用基金会的学术价值和更广泛的影响审查标准。