Integrative deep learning algorithms for understanding protein sequence-structure-function relationships: representation, prediction, and discovery
用于理解蛋白质序列-结构-功能关系的集成深度学习算法:表示、预测和发现
基本信息
- 批准号:10712082
- 负责人:
- 金额:$ 36.29万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-09-05 至 2028-07-31
- 项目状态:未结题
- 来源:
- 关键词:3-DimensionalAccelerationAlgorithmsAmino Acid SequenceAmino AcidsAreaArtificial IntelligenceBiologicalBiological SciencesBiologyBiomedical EngineeringBiotechnologyComputer softwareComputing MethodologiesDataData SetDirected Molecular EvolutionDiseaseGoalsHealthHeterogeneityHumanIntelligenceKnowledgeLearningMeasurementMechanicsModelingNoiseOntologyPropertyProtein AnalysisProtein EngineeringProteinsResearchShapesStructureStructure-Activity RelationshipSystems IntegrationUncertaintyVariantanalytical toolartificial intelligence algorithmartificial intelligence methodcomputational platformcomputer frameworkdata resourcedeep learningdeep learning algorithmdesigndrug discoveryfunctional genomicsgenome-widehigh dimensionalityhuman diseaseimprovedlarge scale datamachine learning methodnext generation sequencingnovelpersonalized diagnosticspersonalized medicineprecision medicineprotein functionprotein structureprotein structure predictionstatistical learningstructural genomicssynthetic biologythree dimensional structurevaccine development
项目摘要
Abstract
Understanding the sequence-structure-function relationship of proteins is of vital importance to protein biology,
biomedicine, and bioengineering. Recent advances in biotechnology have been generating rich datasets to
characterize proteins, such as next-generation sequencing data, three-dimensional (3D) structures, ontology
annotations, and measurements of functional activities, yet how to computationally operationalize these datasets
to fully unveil the structural or functional mechanisms of proteins remains a significant challenge. Existing
computational methods often struggle with the size, high-dimensionality, heterogeneity, incompleteness, and
intrinsic noise of those data, limiting our ability to study protein biology in a holistic and integrated system view.
The goal of this research is to develop new artificial intelligence (AI) methods for effectively integrating and
intelligently modeling heterogeneous protein-related datasets and to advance our understanding of the
mechanical connections between proteins’ sequence, structure, and function. This project not only represents
timely research that leverages the unprecedented opportunities offered by recent AI breakthroughs such as
AlphaFold, but also goes beyond these efforts from protein structure prediction to systematic analyses of protein
biology and unlocks new analytic frameworks that could not be realized previously. Specifically, we will first
develop novel machine learning methods to learn statistical representations that are grounded on the sequence
and structure of proteins and reflect their functional properties. The learned representations will allow us to
characterize how the composition of amino acids and the 3D shape of protein structure determine the function
of a protein. Second, we will develop unified, biology-guided deep learning frameworks to integrate domain
knowledge, such as structural properties and evolutionary relationships, and study several key problems for
characterizing protein functions, including genome-scale function annotation and variant effect prediction. These
efforts will shift the classic sequence-first paradigm of previous studies to a new integrative paradigm and provide
accurate, robust, and interpretable predictions of protein functions. Finally, we will develop a computational
platform that combines data-efficient AI models, uncertainty-guided exploration algorithms, and deep learning-
based generative models for AI-aided directed evolution and sequence-structure co-design of proteins, which
will assist and accelerate the discovery and design of functional proteins. Overall, this proposal will study the
sequence-structure-function relationship of proteins from an integrative perspective, provide new state-of-the-art
AI algorithms with applications in fundamental problems for understanding protein function and human disease,
and generate new actionable biological hypotheses for the discovery and design of novel functional proteins.
The resulting software and data resources will be publicly available through open-access platforms.
Abstract
Understanding the sequence-structure-function relationship of proteins is of vital importance to protein biology,
biomedicine, and bioengineering. Recent advances in biotechnology have been generating rich datasets to
characterize proteins, such as next-generation sequencing data, three-dimensional (3D) structures, ontology
annotations, and measurements of functional activities, yet how to computationally operationalize these datasets
to fully unveil the structural or functional mechanisms of proteins remains a significant challenge. Existing
computational methods often struggle with the size, high-dimensionality, heterogeneity, incompleteness, and
intrinsic noise of those data, limiting our ability to study protein biology in a holistic and integrated system view.
The goal of this research is to develop new artificial intelligence (AI) methods for effectively integrating and
intelligently modeling heterogeneous protein-related datasets and to advance our understanding of the
mechanical connections between proteins’ sequence, structure, and function. This project not only represents
timely research that leverages the unprecedented opportunities offered by recent AI breakthroughs such as
AlphaFold, but also goes beyond these efforts from protein structure prediction to systematic analyses of protein
biology and unlocks new analytic frameworks that could not be realized previously. Specifically, we will first
develop novel machine learning methods to learn statistical representations that are grounded on the sequence
and structure of proteins and reflect their functional properties. The learned representations will allow us to
characterize how the composition of amino acids and the 3D shape of protein structure determine the function
of a protein. Second, we will develop unified, biology-guided deep learning frameworks to integrate domain
knowledge, such as structural properties and evolutionary relationships, and study several key problems for
characterizing protein functions, including genome-scale function annotation and variant effect prediction. These
efforts will shift the classic sequence-first paradigm of previous studies to a new integrative paradigm and provide
accurate, robust, and interpretable predictions of protein functions. Finally, we will develop a computational
platform that combines data-efficient AI models, uncertainty-guided exploration algorithms, and deep learning-
based generative models for AI-aided directed evolution and sequence-structure co-design of proteins, which
will assist and accelerate the discovery and design of functional proteins. Overall, this proposal will study the
sequence-structure-function relationship of proteins from an integrative perspective, provide new state-of-the-art
AI algorithms with applications in fundamental problems for understanding protein function and human disease,
and generate new actionable biological hypotheses for the discovery and design of novel functional proteins.
The resulting software and data resources will be publicly available through open-access platforms.
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Yunan Luo其他文献
Yunan Luo的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
预测向心加速度算法的ODE分析及其在Minmax优化中的应用
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
宽频带微机电加速度传感器及采-算一体化技术及应用
- 批准号:2025JK2016
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于液力惯容阻尼的隔震结构位移与加
速度协同控制研究
- 批准号:
- 批准年份:2025
- 资助金额:10.0 万元
- 项目类别:省市级项目
基于偏心光纤包层光栅的矢量振动加速度传感技术研究
- 批准号:
- 批准年份:2024
- 资助金额:30 万元
- 项目类别:青年科学基金项目
超小型绝对重力仪中扰动加速度的分离算法研究
- 批准号:
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于腔光机械效应的石墨烯光纤加速度计研究
- 批准号:62305039
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于自持相干放大的高精度微腔光力加速度计研究
- 批准号:52305621
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
位移、加速度双控式自复位支撑-高层钢框架结构的抗震设计方法及韧性评估研究
- 批准号:52308484
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
高离心加速度行星排滚针轴承多场耦合特性与保持架断裂失效机理研究
- 批准号:52305047
- 批准年份:2023
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
面向结构和地震运动监测的低成本GNSS和加速度计集成方法研究
- 批准号:42311530062
- 批准年份:2023
- 资助金额:10 万元
- 项目类别:国际(地区)合作与交流项目
相似海外基金
Shared and Distributed Memory Parallel Pre-Conditioning and Acceleration Algorithms for "Spline- Enhanced" Spatial Discretisations
用于“样条增强”空间离散化的共享和分布式内存并行预处理和加速算法
- 批准号:
2907459 - 财政年份:2023
- 资助金额:
$ 36.29万 - 项目类别:
Studentship
Efficient algorithms and succinct data structures for acceleration of telescoping and related problems
用于加速伸缩及相关问题的高效算法和简洁数据结构
- 批准号:
RGPIN-2021-03147 - 财政年份:2022
- 资助金额:
$ 36.29万 - 项目类别:
Discovery Grants Program - Individual
Acceleration framework for training deep learning by cooperative with algorithms and computer architectures
通过与算法和计算机架构合作训练深度学习的加速框架
- 批准号:
21K17768 - 财政年份:2021
- 资助金额:
$ 36.29万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Efficient algorithms and succinct data structures for acceleration of telescoping and related problems
用于加速伸缩及相关问题的高效算法和简洁数据结构
- 批准号:
RGPIN-2021-03147 - 财政年份:2021
- 资助金额:
$ 36.29万 - 项目类别:
Discovery Grants Program - Individual
Material and Device Building Blocks for Hardware Acceleration of Machine Learning and Artificial Intelligence Algorithms
用于机器学习和人工智能算法硬件加速的材料和设备构建模块
- 批准号:
2004791 - 财政年份:2020
- 资助金额:
$ 36.29万 - 项目类别:
Continuing Grant
CIF: Small: Collaborative Research: Acceleration Algorithms for Large-scale Nonconvex Optimization
CIF:小型:协作研究:大规模非凸优化的加速算法
- 批准号:
1909291 - 财政年份:2019
- 资助金额:
$ 36.29万 - 项目类别:
Standard Grant
Acceleration of trigger algorithms with FPGAs at the LHC implemented using higher-level programming languages
使用高级编程语言在 LHC 上使用 FPGA 加速触发算法
- 批准号:
ST/S005560/1 - 财政年份:2019
- 资助金额:
$ 36.29万 - 项目类别:
Training Grant
CIF: Small: Collaborative Research: Acceleration Algorithms for Large-scale Nonconvex Optimization
CIF:小型:协作研究:大规模非凸优化的加速算法
- 批准号:
1909298 - 财政年份:2019
- 资助金额:
$ 36.29万 - 项目类别:
Standard Grant
Acceleration of trigger algorithms with FPGAs at the LHC implemented using higher-level programming languages
使用高级编程语言在 LHC 上使用 FPGA 加速触发算法
- 批准号:
2348748 - 财政年份:2019
- 资助金额:
$ 36.29万 - 项目类别:
Studentship
OAC Core: Small: Enabling High-fidelity Turbulent Reacting-Flow Simulations through Advanced Algorithms, Code Acceleration, and High-order Methods for Extreme-scale Computing
OAC 核心:小型:通过高级算法、代码加速和超大规模计算的高阶方法实现高保真湍流反应流模拟
- 批准号:
1909379 - 财政年份:2019
- 资助金额:
$ 36.29万 - 项目类别:
Standard Grant