Collaborative Research: IIS: III: MEDIUM: Learning Protein-ish: Foundational Insight on Protein Language Models for Better Understanding, Democratized Access, and Discovery
协作研究:IIS:III:中等:学习蛋白质:对蛋白质语言模型的基础洞察,以更好地理解、民主化访问和发现
基本信息
- 批准号:2310113
- 负责人:
- 金额:$ 59.99万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-08-01 至 2026-07-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Large language models are massive neural networks that learn rich contextual representations of words and use such representations to address a variety of tasks in natural language processing (NLP). These models are a prominent example of generative artificial intelligence and are emerging as promising approaches for distilling and organizing the content of massive biological databases and for predicting a wide range of molecular bio-properties. Yet, we know surprisingly little about what these models capture in their learned representations, why they perform well on some tasks and not on others, and how they can produce deep insight into the relationships describing the biological space. If progress in NLP is any indication, the current trend of improving the performance of language models by drastically increasing the number of their trainable parameters is unsustainable both for our carbon footprint and for ensuring equity/accessibility of research and scholarship in the academic setting. This project advances algorithmic research at the intersection of information integration and informatics using principled protein language models (PLMs) as computational vehicles for deeper insight into the structural, functional, and evolutionary organization across protein space at varying levels of detail and scale. It also aims to do so in a way that is resource-aware, sustainable, and accessible to all researchers. The research activities are organized in three thrusts: (1) encoding prior biological knowledge in PLMs for joint and resource-aware learning in composite spaces, (2) revealing fundamental properties and organizing the learned representation space to inform and connect what is captured with properties of interest, and (3) enabling PLMs to capture diverse contexts for deeper exploration of the structural, functional, and evolutionary organization across protein space. This interdisciplinary approach contributes to the fields of machine learning, bioinformatics, and molecular biology and provides opportunities at the interface of these disciplines for training under-represented students of all levels. The investigators are determined to bridge communities and disciplines, and they have planned activities to build and galvanize a trans-disciplinary community to further advance their research.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
大型语言模型是大型神经网络,它学习单词的丰富上下文表示,并使用这些表示来解决自然语言处理(NLP)中的各种任务。这些模型是生成人工智能的一个突出例子,并且正在成为提取和组织大量生物数据库内容以及预测广泛分子生物特性的有前途的方法。然而,令人惊讶的是,我们对这些模型在其学习表征中捕获的内容知之甚少,为什么它们在某些任务上表现良好而在其他任务上表现不佳,以及它们如何能够对描述生物空间的关系产生深刻的见解。如果NLP的进展有任何迹象的话,那么目前通过大幅增加其可训练参数的数量来提高语言模型性能的趋势对于我们的碳足迹以及确保学术环境中研究和奖学金的公平性/可及性来说都是不可持续的。该项目在信息集成和信息学的交叉点推进算法研究,使用原则性蛋白质语言模型(PLM)作为计算工具,以不同的细节和规模水平深入了解蛋白质空间的结构,功能和进化组织。它还旨在以一种资源意识,可持续和所有研究人员都可以使用的方式这样做。研究活动分为三个方面:(1)在PLM中编码先前的生物学知识,用于复合空间中的联合和资源感知学习,(2)揭示基本属性并组织学习的表示空间,以告知捕获的内容并将其与感兴趣的属性联系起来,以及(3)使PLM能够捕获不同的上下文,用于更深入地探索结构,功能,和蛋白质空间的进化组织。这种跨学科的方法有助于机器学习,生物信息学和分子生物学领域,并在这些学科的接口为培训各级代表性不足的学生提供机会。研究人员决心在社区和学科之间架起桥梁,他们计划开展活动,建立和激励跨学科社区,以进一步推进他们的研究。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Amarda Shehu其他文献
An Evolutionary Search Algorithm to Guide Stochastic Search for Near-Native Protein Conformations with Multiobjective Analysis
一种进化搜索算法,通过多目标分析指导随机搜索近天然蛋白质构象
- DOI:
- 发表时间:
2013 - 期刊:
- 影响因子:0
- 作者:
Brian S. Olson;Amarda Shehu - 通讯作者:
Amarda Shehu
Molecules in motion: Computing structural flexibility
- DOI:
- 发表时间:
2008 - 期刊:
- 影响因子:0
- 作者:
Amarda Shehu - 通讯作者:
Amarda Shehu
Structure- and Energy-based Analysis of Small Molecule Ligand Binding to Steroid Nuclear Receptors
小分子配体与类固醇核受体结合的基于结构和能量的分析
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Megan Herceg;Amarda Shehu - 通讯作者:
Amarda Shehu
On the characterization of protein native state ensembles.
关于蛋白质天然状态整体的表征。
- DOI:
10.1529/biophysj.106.094409 - 发表时间:
2007 - 期刊:
- 影响因子:3.4
- 作者:
Amarda Shehu;L. Kavraki;C. Clementi - 通讯作者:
C. Clementi
Reconstructing and mining protein energy landscape to understand disease
重建和挖掘蛋白质能量景观以了解疾病
- DOI:
10.1109/bibm.2017.8217619 - 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Wanli Qiao;T. Maximova;X. Fang;E. Plaku;Amarda Shehu - 通讯作者:
Amarda Shehu
Amarda Shehu的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Amarda Shehu', 18)}}的其他基金
Collaborative Research: Conference: Large Language Models for Biological Discoveries (LLMs4Bio)
合作研究:会议:生物发现的大型语言模型 (LLMs4Bio)
- 批准号:
2411529 - 财政年份:2024
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
Collaborative Research: IIBR: Innovation: Bioinformatics: Linking Chemical and Biological Space: Deep Learning and Experimentation for Property-Controlled Molecule Generation
合作研究:IIBR:创新:生物信息学:连接化学和生物空间:属性控制分子生成的深度学习和实验
- 批准号:
2318829 - 财政年份:2023
- 资助金额:
$ 59.99万 - 项目类别:
Continuing Grant
Intergovernmental Personnel Act
政府间人事法
- 批准号:
1948645 - 财政年份:2019
- 资助金额:
$ 59.99万 - 项目类别:
Intergovernmental Personnel Award
Collaborative: SI2-SSE - A Plug-and-Play Software Platform of Robotics-Inspired Algorithms for Modeling Biomolecular Structures and Motions
协作:SI2-SSE - 用于生物分子结构和运动建模的机器人启发算法的即插即用软件平台
- 批准号:
1440581 - 财政年份:2015
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
Travel Awards for 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM-2015)
2015 年 IEEE 国际生物信息学和生物医学会议 (BIBM-2015) 旅行奖
- 批准号:
1543744 - 财政年份:2015
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
CCF: AF: Small: Novel Stochastic Optimization Algorithms to Advance the Treatment of Dynamic Molecular Systems
CCF:AF:Small:新型随机优化算法推进动态分子系统的治疗
- 批准号:
1421001 - 财政年份:2014
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
Workshop: 2014 NSF CISE CAREER Proposal Writing Workshop
研讨会:2014 NSF CISE CAREER 提案写作研讨会
- 批准号:
1415210 - 财政年份:2013
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
CAREER: Probabilistic Methods for Addressing Complexity and Constraints in Protein Systems
职业:解决蛋白质系统复杂性和约束的概率方法
- 批准号:
1144106 - 财政年份:2012
- 资助金额:
$ 59.99万 - 项目类别:
Continuing Grant
AF: Small: A Unified Computational Framework to Enhance the Ab-Initio Sampling of Native-Like Protein Conformations
AF:小型:增强类天然蛋白质构象从头开始采样的统一计算框架
- 批准号:
1016995 - 财政年份:2010
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
Cell Research
- 批准号:31224802
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research
- 批准号:31024804
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research (细胞研究)
- 批准号:30824808
- 批准年份:2008
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
- 批准号:10774081
- 批准年份:2007
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: IIS Core: Small: World Values of Conversational AI and the Consequences for Human-AI Interaction
协作研究:IIS 核心:小:对话式 AI 的世界价值以及人机交互的后果
- 批准号:
2230466 - 财政年份:2023
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
Collaborative Research: IIS Core: Small: World Values of Conversational AI and the Consequences for Human-AI Interaction
协作研究:IIS 核心:小:对话式 AI 的世界价值以及人机交互的后果
- 批准号:
2230467 - 财政年份:2023
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
Collaborative Research: CISE-MSI: DP: IIS: Event Detection and Knowledge Extraction via Learning and Causality Analysis for Resilience Emergency Response
协作研究:CISE-MSI:DP:IIS:通过学习和因果关系分析进行事件检测和知识提取,以实现弹性应急响应
- 批准号:
2219615 - 财政年份:2023
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
Collaborative Research: IIS-III Towards Fair Outlier Detection
协作研究:IIS-III 迈向公平的异常值检测
- 批准号:
2310482 - 财政年份:2023
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
Collaborative Research: CISE-MSI: DP: IIS RI: Research Capacity Expansion via Development of AI Based Algorithms for Optimal Management of Electric Vehicle Transactions with Grid
合作研究:CISE-MSI:DP:IIS RI:通过开发基于人工智能的算法来扩展研究能力,以实现电动汽车与电网交易的优化管理
- 批准号:
2318611 - 财政年份:2023
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
Collaborative Research: IIS-III: Small Towards Fair Outlier Detection
协作研究:IIS-III:小到公平的异常值检测
- 批准号:
2310481 - 财政年份:2023
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
Collaborative Research: IIS: III: MEDIUM: Learning Protein-ish: Foundational Insight on Protein Language Models for Better Understanding, Democratized Access, and Discovery
协作研究:IIS:III:中等:学习蛋白质:对蛋白质语言模型的基础洞察,以更好地理解、民主化访问和发现
- 批准号:
2310114 - 财政年份:2023
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
Collaborative Research: CISE-MSI: DP: IIS RI: Research Capacity Expansion via Development of AI Based Algorithms for Optimal Management of Electric Vehicle Transactions with Grid
合作研究:CISE-MSI:DP:IIS RI:通过开发基于人工智能的算法来扩展研究能力,以实现电动汽车与电网交易的优化管理
- 批准号:
2318612 - 财政年份:2023
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
Collaborative Research: CISE-MSI: DP: IIS: Event Detection and Knowledge Extraction via Learning and Causality Analysis for Resilience Emergency Response
协作研究:CISE-MSI:DP:IIS:通过学习和因果关系分析进行事件检测和知识提取,以实现弹性应急响应
- 批准号:
2219614 - 财政年份:2023
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
Collaborative Research: IIS: HCC: Small: The New Gatekeepers: Content Moderation and Information Threats in Local Communities
协作研究:IIS:HCC:小型:新的看门人:当地社区的内容审核和信息威胁
- 批准号:
2207834 - 财政年份:2022
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant