III: Small: Integrated prediction of intrinsic disorder and disorder functions with modular multi-label deep learning

III:小:通过模块化多标签深度学习对内在无序和无序函数进行集成预测

基本信息

  • 批准号:
    2125218
  • 负责人:
  • 金额:
    $ 50万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-10-01 至 2024-09-30
  • 项目状态:
    已结题

项目摘要

Proteins are remarkable biological machines. Hundreds of millions of protein sequences were decoded over the last two decades creating a significant knowledge gap related to the fact that we do not know what most of them do. A common way to decipher protein functions relies on the sequence-to-structure-to-function paradigm where protein function is learned from the protein structure that is produced from the sequence. However, recent research has identified a large family of the intrinsically disordered proteins that lack a stable structure under physiological conditions and which therefore cannot be characterized using the structure-based approaches. These proteins are particularly abundant in the eukaryotes and are involved in the pathogenesis of numerous human diseases. The discovery of the intrinsically disordered proteins has prompted the development of a new generation of computational methods that predict presence of intrinsic disorder directly from protein sequences. A recently completed Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment has shown that these methods are fast and provide accurate results. However, while intrinsic disorder can be readily and accurately identified in protein sequences, its function remains a mystery. This proposal will conceptualize, design, implement, test and deploy an innovative machine learning method that provides highly accurate and integrated predictions of disorder and disorder functions directly from protein sequences. The team will utilize this method to produce functional annotations of disorder on an unprecedented scale of dozens of millions of proteins, addressing the knowledge gap problem for this protein family. In the long run this project will advance understanding of fundamental biological processes and related human health issues in the context of the intrinsically disordered proteins. This project will also train STEM students and researchers via high-school outreach and multidisciplinary teaching and mentoring of undergraduate and graduate students and postdoctoral researchers, producing highly skilled researchers who are sought after by industry and academia.An interdisciplinary and challenging problem of the structure of intrinsically disorder protein structure at the intersection of bioinformatics and machine learning fields is addressed by the team. Building on expertise in the computational analysis of intrinsic disorder and with focus on technical innovation, this project will deliver a novel deep sequential multi-label transformer architecture that provides accurate predictions of disorder and disorder functions. The solution will be designed to accommodate for the biological underpinnings of protein data, such as the inherently multi-label outcomes, imbalanced labels and sequential nature of protein data. Moreover, this architecture will feature modular design to facilitate transfer to other areas of protein and nucleic acids bioinformatics. The resulting method will be extensively benchmarked and disseminated to maximize impact. The code will be deposited into relevant public repositories and pre-computed functional annotations of intrinsic disorder will be made available using modern online resources, such as data repositories and webservers, in order to meet the needs of a broad spectrum of users including biologists, biochemist, biophysicists and bioinformaticians.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
蛋白质是显着的生物机器。在过去的二十年中,对数亿个蛋白质序列进行了解码,从而造成了与我们不知道大多数人做什么的重要知识差距。解密蛋白质功能的一种常见方法依赖于序列到结构范式的范式,其中从序列产生的蛋白质结构中学到了蛋白质功能。但是,最近的研究已经确定了一大批本质上无序的蛋白质家族,这些蛋白质在生理条件下缺乏稳定的结构,因此无法使用基于结构的方法来表征。这些蛋白质在真核生物中特别丰富,参与了许多人类疾病的发病机理。固有无序蛋白的发现促使了新一代的计算方法的发展,这些方法直接从蛋白质序列中预测了内在疾病。最近完成的对蛋白质内在障碍预测(CAID)实验的批判性评估表明,这些方法很快并提供了准确的结果。但是,尽管可以在蛋白质序列中轻松且准确地识别固有疾病,但其功能仍然是一个谜。该建议将概念化,设计,实施,测试和部署一种创新的机器学习方法,该方法直接从蛋白质序列中提供了高度准确和集成的障碍和疾病功能的预测。该团队将利用这种方法在空前的数十个蛋白质的范围内产生疾病的功能注释,以解决该蛋白质家族的知识差距问题。从长远来看,这个项目将在本质上无序的蛋白质中提高对基本生物学过程和相关人类健康问题的了解。该项目还将通过高中课程和多学科的教学以及对本科生,研究生以及博士后研究人员的指导来培训STEM的学生和研究人员,从而培养了由行业和学术界所追求的高技能的研究人员。一个跨学科和挑战性的问题,这些问题是由固有的蛋白质结构的结构来解决生物学和机器学习的杂物蛋白质结构的结构。该项目基于内在障碍的计算分析专业知识,并着重于技术创新,将提供一种新型的深层顺序多标签变压器架构,可准确预测混乱和混乱功能。该解决方案将旨在适应蛋白质数据的生物基础,例如固有的多标签结果,不平衡标签和蛋白质数据的顺序性质。此外,该体系结构将具有模块化设计,以促进转移到其他蛋白质和核酸生物信息学领域。所得的方法将得到广泛的基准测试和传播,以最大程度地发挥影响。该守则将存放到相关的公共存储库中,并将使用现代资源(例如数据存储库和Webervers)提供内科障碍的预计功能注释,以满足广泛的用户需求,包括生物化学家,生物化学家,生物化学家,生物物理学家和生物学家奖,并通过NESF的代表性进行了代表,并以NSF的代表进行了代表性的代表。智力优点和更广泛的影响审查标准。

项目成果

期刊论文数量(9)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Surveying over 100 predictors of intrinsic disorder in proteins
  • DOI:
    10.1080/14789450.2021.2018304
  • 发表时间:
    2021-12-29
  • 期刊:
  • 影响因子:
    3.4
  • 作者:
    Zhao,Bi;Kurgan,Lukasz
  • 通讯作者:
    Kurgan,Lukasz
Resources for computational prediction of intrinsic disorder in proteins
  • DOI:
    10.1016/j.ymeth.2022.03.018
  • 发表时间:
    2022-05-27
  • 期刊:
  • 影响因子:
    4.8
  • 作者:
    Kurgan,Lukasz
  • 通讯作者:
    Kurgan,Lukasz
Overview Update: Computational Prediction of Intrinsic Disorder in Proteins
  • DOI:
    10.1002/cpz1.802
  • 发表时间:
    2023-06-01
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Uversky,Vladimir N.;Kurgan,Lukasz
  • 通讯作者:
    Kurgan,Lukasz
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Lukasz Kurgan其他文献

Corrigendum to: Comprehensive review and empirical analysis of hallmarks of DNA-, RNA- and protein-binding residues in protein chains
勘误表:蛋白质链中 DNA、RNA 和蛋白质结合残基特征的全面回顾和实证分析
  • DOI:
    10.1093/bib/bbz102
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    9.5
  • 作者:
    Jian Zhang;Zhiqiang Ma;Lukasz Kurgan
  • 通讯作者:
    Lukasz Kurgan
Improved prediction of residue flexibility by embedding optimized amino acid grouping into RSA-based linear models
通过将优化的氨基酸分组嵌入基于 RSA 的线性模型来改进残基灵活性的预测
  • DOI:
    10.1007/s00726-014-1817-9
  • 发表时间:
    2014-08
  • 期刊:
  • 影响因子:
    3.5
  • 作者:
    Hua Zhang;Lukasz Kurgan
  • 通讯作者:
    Lukasz Kurgan
Supervised Learning: Statistical Methods
监督学习:统计方法
  • DOI:
    10.1007/978-0-387-36795-8_11
  • 发表时间:
    2007
  • 期刊:
  • 影响因子:
    0
  • 作者:
    K. Cios;R. Swiniarski;W. Pedrycz;Lukasz Kurgan
  • 通讯作者:
    Lukasz Kurgan
Systematic investigation of sequence and structural motifs that recognize ATP
系统研究识别 ATP 的序列和结构基序
  • DOI:
    10.1016/j.compbiolchem.2015.04.008
  • 发表时间:
    2015-06
  • 期刊:
  • 影响因子:
    3.1
  • 作者:
    Ke Chen;Dacheng Wang;Lukasz Kurgan
  • 通讯作者:
    Lukasz Kurgan
Machine learning in the life sciences
生命科学中的机器学习

Lukasz Kurgan的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Lukasz Kurgan', 18)}}的其他基金

Collaborative Research: Identification and Structural Modeling of Intrinsically Disordered Protein-Protein and Protein-Nucleic Acids Interactions
合作研究:本质无序的蛋白质-蛋白质和蛋白质-核酸相互作用的识别和结构建模
  • 批准号:
    2146027
  • 财政年份:
    2022
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
III: Small: High-Throughput Annotation of Cellular Functions of Intrinsic Disorder in Proteins
III:小:蛋白质内在紊乱的细胞功能的高通量注释
  • 批准号:
    1617369
  • 财政年份:
    2016
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant

相似国自然基金

Podoplanin调控EGFR信号通路介导NTRK融合基因阳性非小细胞肺癌靶向耐药的机制研究
  • 批准号:
    82373044
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目
融合多源异构数据的小微企业经营风险智能识别与应对策略研究
  • 批准号:
    72301188
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于多时序多模态分子影像Delta深度融合学习预测非小细胞肺癌免疫治疗疗效的研究
  • 批准号:
    82371994
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
靶向非小细胞肺癌ALK融合蛋白新型放射性示踪剂的研制及其初步应用探索
  • 批准号:
    22376125
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
融合多源空地数据与物理机制的低纬小尺度电离层近实时模型研究
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

FET: III: Small: Innovative Approaches for Bias Correction and Systems-level Analysis in Integrated Multi-omics Data
FET:III:小型:集成多组学数据中的偏差校正和系统级分析的创新方法
  • 批准号:
    2203236
  • 财政年份:
    2022
  • 资助金额:
    $ 50万
  • 项目类别:
    Continuing Grant
III: Small: QUEST: An Integrated Query and Event System on Noisy Streams and Tables
III:小:QUEST:一个关于嘈杂流和表的集成查询和事件系统
  • 批准号:
    1319600
  • 财政年份:
    2013
  • 资助金额:
    $ 50万
  • 项目类别:
    Continuing Grant
III: Small: Integrated Digital Event Archiving and Library (IDEAL)
III:小型:集成数字事件归档和图书馆 (IDEAL)
  • 批准号:
    1319578
  • 财政年份:
    2013
  • 资助金额:
    $ 50万
  • 项目类别:
    Continuing Grant
III: Small: Topology-based approaches to integrated analysis of transcriptomic, protein interactomic and phenotypic data
III:小:基于拓扑的方法对转录组、蛋白质相互作用组和表型数据进行综合分析
  • 批准号:
    1218201
  • 财政年份:
    2012
  • 资助金额:
    $ 50万
  • 项目类别:
    Continuing Grant
III: Small: AegisDB: Integrated Real-Time Geo-Stream Processing and Monitoring System: A Data-Type-Based Approach
III:小型:AegisDB:集成实时地理流处理和监测系统:基于数据类型的方法
  • 批准号:
    1017926
  • 财政年份:
    2010
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了