III: Small: Integrated prediction of intrinsic disorder and disorder functions with modular multi-label deep learning

III:小:通过模块化多标签深度学习对内在无序和无序函数进行集成预测

基本信息

  • 批准号:
    2125218
  • 负责人:
  • 金额:
    $ 50万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-10-01 至 2024-09-30
  • 项目状态:
    已结题

项目摘要

Proteins are remarkable biological machines. Hundreds of millions of protein sequences were decoded over the last two decades creating a significant knowledge gap related to the fact that we do not know what most of them do. A common way to decipher protein functions relies on the sequence-to-structure-to-function paradigm where protein function is learned from the protein structure that is produced from the sequence. However, recent research has identified a large family of the intrinsically disordered proteins that lack a stable structure under physiological conditions and which therefore cannot be characterized using the structure-based approaches. These proteins are particularly abundant in the eukaryotes and are involved in the pathogenesis of numerous human diseases. The discovery of the intrinsically disordered proteins has prompted the development of a new generation of computational methods that predict presence of intrinsic disorder directly from protein sequences. A recently completed Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment has shown that these methods are fast and provide accurate results. However, while intrinsic disorder can be readily and accurately identified in protein sequences, its function remains a mystery. This proposal will conceptualize, design, implement, test and deploy an innovative machine learning method that provides highly accurate and integrated predictions of disorder and disorder functions directly from protein sequences. The team will utilize this method to produce functional annotations of disorder on an unprecedented scale of dozens of millions of proteins, addressing the knowledge gap problem for this protein family. In the long run this project will advance understanding of fundamental biological processes and related human health issues in the context of the intrinsically disordered proteins. This project will also train STEM students and researchers via high-school outreach and multidisciplinary teaching and mentoring of undergraduate and graduate students and postdoctoral researchers, producing highly skilled researchers who are sought after by industry and academia.An interdisciplinary and challenging problem of the structure of intrinsically disorder protein structure at the intersection of bioinformatics and machine learning fields is addressed by the team. Building on expertise in the computational analysis of intrinsic disorder and with focus on technical innovation, this project will deliver a novel deep sequential multi-label transformer architecture that provides accurate predictions of disorder and disorder functions. The solution will be designed to accommodate for the biological underpinnings of protein data, such as the inherently multi-label outcomes, imbalanced labels and sequential nature of protein data. Moreover, this architecture will feature modular design to facilitate transfer to other areas of protein and nucleic acids bioinformatics. The resulting method will be extensively benchmarked and disseminated to maximize impact. The code will be deposited into relevant public repositories and pre-computed functional annotations of intrinsic disorder will be made available using modern online resources, such as data repositories and webservers, in order to meet the needs of a broad spectrum of users including biologists, biochemist, biophysicists and bioinformaticians.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
蛋白质是了不起的生物机器。在过去的20年里,数以亿计的蛋白质序列被解码,这造成了一个巨大的知识缺口,因为我们不知道它们中的大多数是做什么的。破译蛋白质功能的一种常见方法依赖于序列-结构-功能范式,其中蛋白质功能是从序列产生的蛋白质结构中学习的。然而,最近的研究已经确定了一个大家族的内在无序的蛋白质,缺乏一个稳定的结构在生理条件下,因此不能使用基于结构的方法进行表征。这些蛋白质在真核生物中特别丰富,并参与许多人类疾病的发病机制。内在无序蛋白质的发现促进了新一代直接从蛋白质序列预测内在无序存在的计算方法的发展。最近完成的蛋白质内在紊乱预测(CAID)实验的关键评估表明,这些方法是快速的,并提供准确的结果。然而,虽然内在紊乱可以在蛋白质序列中容易和准确地识别,但其功能仍然是一个谜。该提案将概念化,设计,实施,测试和部署一种创新的机器学习方法,该方法直接从蛋白质序列中提供高度准确和集成的疾病和疾病功能预测。该团队将利用这种方法在前所未有的数千万蛋白质规模上产生紊乱的功能注释,解决这个蛋白质家族的知识缺口问题。从长远来看,该项目将促进对内在无序蛋白质背景下基本生物过程和相关人类健康问题的理解。该项目还将通过高中外展和本科生、研究生和博士后研究人员的多学科教学和指导,培养STEM学生和研究人员,培养出受到产业界和学术界追捧的高技能研究人员。该团队致力于解决生物信息学和机器学习领域交叉点上的跨学科和具有挑战性的内在无序蛋白质结构问题。基于内在无序的计算分析专业知识,并专注于技术创新,该项目将提供一种新型的深度顺序多标签Transformer架构,可提供对无序和无序功能的准确预测。该解决方案将被设计为适应蛋白质数据的生物学基础,例如蛋白质数据固有的多标签结果,不平衡的标签和连续性。此外,该架构将采用模块化设计,以促进转移到蛋白质和核酸生物信息学的其他领域。将对由此产生的方法进行广泛的基准测试和推广,以最大限度地扩大影响。该代码将存放在相关的公共存储库中,并将使用现代在线资源(如数据存储库和网络服务器)提供内在疾病的预先计算的功能注释,以满足广泛用户的需求,包括生物学家,生物化学家,该奖项反映了NSF的法定使命,并被认为值得通过使用基金会的学术价值和更广泛的影响审查标准。

项目成果

期刊论文数量(9)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Resources for computational prediction of intrinsic disorder in proteins
  • DOI:
    10.1016/j.ymeth.2022.03.018
  • 发表时间:
    2022-05-27
  • 期刊:
  • 影响因子:
    4.8
  • 作者:
    Kurgan,Lukasz
  • 通讯作者:
    Kurgan,Lukasz
Overview Update: Computational Prediction of Intrinsic Disorder in Proteins
  • DOI:
    10.1002/cpz1.802
  • 发表时间:
    2023-06-01
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Uversky,Vladimir N.;Kurgan,Lukasz
  • 通讯作者:
    Kurgan,Lukasz
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Lukasz Kurgan其他文献

Corrigendum to: Comprehensive review and empirical analysis of hallmarks of DNA-, RNA- and protein-binding residues in protein chains
勘误表:蛋白质链中 DNA、RNA 和蛋白质结合残基特征的全面回顾和实证分析
  • DOI:
    10.1093/bib/bbz102
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    9.5
  • 作者:
    Jian Zhang;Zhiqiang Ma;Lukasz Kurgan
  • 通讯作者:
    Lukasz Kurgan
Tutorial: a guide for the selection of fast and accurate computational tools for the prediction of intrinsic disorder in proteins
  • DOI:
    10.1038/s41596-023-00876-x
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    14.8
  • 作者:
    Lukasz Kurgan;Gang Hu;Kui Wang;Sina Ghadermarzi;Bi Zhao;Nawar Malhis;Gábor Erdős;Jörg Gsponer;Vladimir N. Uversky;Zsuzsanna Dosztányi
  • 通讯作者:
    Zsuzsanna Dosztányi
Computational prediction of MoRFs, short disorder-to-order transitioning protein binding regions
Supervised Learning: Statistical Methods
监督学习:统计方法
  • DOI:
    10.1007/978-0-387-36795-8_11
  • 发表时间:
    2007
  • 期刊:
  • 影响因子:
    0
  • 作者:
    K. Cios;R. Swiniarski;W. Pedrycz;Lukasz Kurgan
  • 通讯作者:
    Lukasz Kurgan
Improved prediction of residue flexibility by embedding optimized amino acid grouping into RSA-based linear models
通过将优化的氨基酸分组嵌入基于 RSA 的线性模型来改进残基灵活性的预测
  • DOI:
    10.1007/s00726-014-1817-9
  • 发表时间:
    2014-08
  • 期刊:
  • 影响因子:
    3.5
  • 作者:
    Hua Zhang;Lukasz Kurgan
  • 通讯作者:
    Lukasz Kurgan

Lukasz Kurgan的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Lukasz Kurgan', 18)}}的其他基金

Collaborative Research: Identification and Structural Modeling of Intrinsically Disordered Protein-Protein and Protein-Nucleic Acids Interactions
合作研究:本质无序的蛋白质-蛋白质和蛋白质-核酸相互作用的识别和结构建模
  • 批准号:
    2146027
  • 财政年份:
    2022
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
III: Small: High-Throughput Annotation of Cellular Functions of Intrinsic Disorder in Proteins
III:小:蛋白质内在紊乱的细胞功能的高通量注释
  • 批准号:
    1617369
  • 财政年份:
    2016
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant

相似国自然基金

昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
  • 批准号:
    n/a
  • 批准年份:
    2022
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
  • 批准号:
    32000033
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
  • 批准号:
    31972324
  • 批准年份:
    2019
  • 资助金额:
    58.0 万元
  • 项目类别:
    面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
  • 批准号:
    81900988
  • 批准年份:
    2019
  • 资助金额:
    21.0 万元
  • 项目类别:
    青年科学基金项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
  • 批准号:
    31802058
  • 批准年份:
    2018
  • 资助金额:
    26.0 万元
  • 项目类别:
    青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
  • 批准号:
    31870821
  • 批准年份:
    2018
  • 资助金额:
    56.0 万元
  • 项目类别:
    面上项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
  • 批准号:
    31772128
  • 批准年份:
    2017
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
  • 批准号:
    81704176
  • 批准年份:
    2017
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
  • 批准号:
    91640114
  • 批准年份:
    2016
  • 资助金额:
    85.0 万元
  • 项目类别:
    重大研究计划

相似海外基金

Collaborative Research: NSF-AoF: CIF: Small: AI-assisted Waveform and Beamforming Design for Integrated Sensing and Communication
合作研究:NSF-AoF:CIF:小型:用于集成传感和通信的人工智能辅助波形和波束成形设计
  • 批准号:
    2326622
  • 财政年份:
    2024
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
SHF: Small: Semi-supervised Learning for Design and Quality Assurance of Integrated Circuits
SHF:小型:集成电路设计和质量保证的半监督学习
  • 批准号:
    2334380
  • 财政年份:
    2024
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
CC* Integration-Small: M2- NET: An Integrated Access and Backhaul Millimeter-wave Wireless Network for Campus Connectivity and Research
CC* Integration-Small:M2-NET:用于校园连接和研究的集成接入和回程毫米波无线网络
  • 批准号:
    2346621
  • 财政年份:
    2024
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Collaborative Research: NSF-AoF: CIF: Small: AI-assisted Waveform and Beamforming Design for Integrated Sensing and Communication
合作研究:NSF-AoF:CIF:小型:用于集成传感和通信的人工智能辅助波形和波束成形设计
  • 批准号:
    2326621
  • 财政年份:
    2024
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Integrated fragment-based phenotypic screening and chemoproteomics for identification of novel small cell lung cancer-specific targets
基于片段的表型筛选和化学蛋白质组学相结合,用于鉴定新型小细胞肺癌特异性靶标
  • 批准号:
    10577507
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
SHF: Small: Explainable Machine Learning for Better Design of Very Large Scale Integrated Circuits
SHF:小:可解释的机器学习,用于更好地设计超大规模集成电路
  • 批准号:
    2322713
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
SHF: Small: Testing and Design-for-Test Techniques for Monolithic 3D Integrated Circuits
SHF:小型:单片 3D 集成电路的测试和测试设计技术
  • 批准号:
    2309822
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Development of hybrid thermal model for small bodies integrated shape and roughness models
开发小型物体集成形状和粗糙度模型的混合热模型
  • 批准号:
    23K03478
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
FET: Small: An Integrated Framework for the Optimal Control of Open Quantum Systems --- Theory, Quantum Algorithms, and Applications
FET:小型:开放量子系统最优控制的集成框架 --- 理论、量子算法和应用
  • 批准号:
    2312456
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Integrated blood and radiomic subtyping to guide immunotherapy treatment selection and early response assessment in metastatic non-small cell lung cancer
综合血液和放射组学亚型,指导转移性非小细胞肺癌的免疫治疗选择和早期反应评估
  • 批准号:
    10734127
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了