UniProt: A Protein Sequence and Function Resource for Biomedical Science
UniProt:生物医学的蛋白质序列和功能资源
基本信息
- 批准号:10267787
- 负责人:
- 金额:$ 383.32万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-09-18 至 2026-05-31
- 项目状态:未结题
- 来源:
- 关键词:AffectAmino Acid SequenceArtificial IntelligenceBiomedical ResearchCatalogsCellsCollaborationsCommunitiesComplementComplexCuesDataData SetDevelopmentDiseaseDisease susceptibilityDistance LearningEnsureEnvironmentFAIR principlesGenomicsGenotypeGleanGoldGrowthHealthHereditary DiseaseHumanHuman GeneticsHuman MicrobiomeIndividualInternationalInternetKnowledgeKnowledge ExtractionLiteratureMachine LearningMethodsModelingModernizationMolecularMolecular BiologyMolecular Sequence DataMolecular StructureOntologyOrganOrthologous GeneOutcomePaperPathway interactionsPatternPharmaceutical PreparationsPhenotypePlayProcessProductionProtein ArrayProteinsPublicationsReadabilityReadinessResearchResearch PersonnelResourcesRoleScienceShapesSiteStandardizationStructural ProteinSystemTechnologyTimeTissuesTrainingTriageVariantWorkbiomedical data sciencebiomedical resourcecrowdsourcingdata accessdata reusedeep learningdesignexperienceexperimental studyformycin triphosphategenetic architecturegenomic variationhackathonhuman diseaseimprovedinnovationknowledge baselearning strategymachine learning methodmacromolecular assemblymeetingsnew technologypathogenpersonalized diagnosticsprognosticprotein functionresponsesocial mediasymposiumtext searchingweb sitewebinar
项目摘要
PROJECT SUMMARY/ABSTRACT
This project continues the development of the UniProt Knowledgebase, which aims to provide the scientific
community with a comprehensive, high-quality, and freely accessible resource of protein sequences and
functional information. Proteins are an essential bridge between human genetics, the environment and
phenotype. While human genetics has increasing power to find correlations between genotype and phenotype,
knowledge of how proteins function, provided by UniProt, is essential for the mechanistic understanding critical
to develop health outcomes through improved and personalized diagnostics, prognostics, and treatments.
Biomedical research is being revolutionized by methods from the field of Artificial Intelligence, particularly
Machine Learning (ML) approaches such as Deep Learning (DL). These approaches now outstrip the ability of
humans in many fields and are state-of-the-art when sufficient data is available. UniProt provides gold standard
training data for hundreds of ML applications in biomedical research. The work in this proposal will enhance the
readiness of UniProt for use in ML and will integrate ML methods to enhance our efficiency.
UniProt curators extract and synthesize experimental knowledge of proteins from papers in human and machine-
readable forms using a range of standard ontologies. This proposal will further structure protein knowledge in
UniProt, developing complete, machine-readable catalogs of the functional impact of human variation and of
human protein networks and complexes, essential to understanding human disease. Efficiency of curation will
be improved using DL models, developed in collaboration with text mining experts, to automate the identification
of relevant papers and accelerate extraction of knowledge. This extracted knowledge will be validated by our
expert curators and also the wider research community who will be actively engaged to further scale curation.
ML approaches will also be used to infer annotations for proteins with no experimental characterization, using
community challenges to develop faster, more accurate, scalable approaches to annotate the deluge of
uncharacterized proteins.
UniProt is an exemplar FAIR resource and has served the scientific community with metronomic data releases
despite an exponential growth in data volumes. Streamlined production processes will scale efficiently and
sustainably with both the growing data volume and complexity. We will explore novel technologies to ensure the
continued timely release of data to the community according to the FAIR principles.
UniProt is an international hub of protein data that serves hundreds of thousands of users annually. We will
continue using user-centric approaches to develop the UniProt website in response to user needs and new data
types. We will engage with our stakeholders and collaborators by introducing an annual strategic partnership
meeting. We will engage our communities through webinars, social media, hackathons and attendance at
scientific meetings to broaden the efficient and impactful use of our data.
项目总结/摘要
该项目继续开发UniProt知识库,旨在提供科学的
社区提供全面,高质量和免费获得的蛋白质序列资源,
功能信息。蛋白质是人类遗传学、环境和
表型虽然人类遗传学越来越有能力发现基因型和表型之间的相关性,
由UniProt提供的蛋白质功能的知识对于理解关键的
通过改进和个性化的诊断、诊断和治疗来改善健康状况。
人工智能领域的方法正在使生物医学研究发生革命性的变化,
机器学习(ML)方法,如深度学习(DL)。这些方法现在已经超出了
人类在许多领域,是国家的最先进的,当有足够的数据。UniProt提供黄金标准
为生物医学研究中的数百个ML应用提供训练数据。本提案中的工作将加强
UniProt已经准备好用于ML,并将整合ML方法以提高我们的效率。
UniProt策展人从人类和机器的论文中提取和合成蛋白质的实验知识-
使用一系列标准本体的可读形式。这一建议将进一步结构蛋白质的知识,
UniProt,开发完整的,机器可读的人类变异的功能影响目录,
人类蛋白质网络和复合物,对了解人类疾病至关重要。管理效率将
使用与文本挖掘专家合作开发的DL模型进行改进,以自动识别
相关论文,加速知识提取。这些提取的知识将由我们的
专家策展人和更广泛的研究社区将积极参与进一步扩大策展规模。
ML方法还将用于推断没有实验表征的蛋白质的注释,
社区面临的挑战是开发更快,更准确,可扩展的方法来注释洪水,
未知蛋白质
UniProt是一个典型的FAIR资源,并为科学界提供节拍数据发布服务
尽管数据量呈指数增长。简化的生产流程将有效扩展,
随着数据量和复杂性的不断增长,我们将探索新技术,
根据公平原则,继续及时向社会发布数据。
UniProt是一个国际蛋白质数据中心,每年为数十万用户提供服务。我们将
继续采用以用户为中心的方法,根据用户需求和新数据开发UniProt网站
类型我们将通过引入年度战略合作伙伴关系,与利益相关者和合作者互动
会议我们将通过网络研讨会、社交媒体、黑客马拉松和参加
科学会议,以扩大我们的数据的有效和有影响力的使用。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Alex Bateman其他文献
Alex Bateman的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Alex Bateman', 18)}}的其他基金
UniProt: A centralized protein sequence and function resource
UniProt:集中的蛋白质序列和功能资源
- 批准号:
9114369 - 财政年份:2014
- 资助金额:
$ 383.32万 - 项目类别:
UniProt - Enhancing functional genomics data access for the Alzheimer's Disease (AD) and dementia-related protein research communities
UniProt - 增强阿尔茨海默病 (AD) 和痴呆相关蛋白质研究社区的功能基因组学数据访问
- 批准号:
10121011 - 财政年份:2014
- 资助金额:
$ 383.32万 - 项目类别:
UniProt: A centralized protein sequence and function resource
UniProt:集中的蛋白质序列和功能资源
- 批准号:
8739769 - 财政年份:2014
- 资助金额:
$ 383.32万 - 项目类别:
UniProt: A centralized protein sequence and function resource
UniProt:集中的蛋白质序列和功能资源
- 批准号:
9069018 - 财政年份:2014
- 资助金额:
$ 383.32万 - 项目类别:
UniProt: A Protein Sequence and Function Resource for Biomedical Science
UniProt:生物医学的蛋白质序列和功能资源
- 批准号:
10663983 - 财政年份:2014
- 资助金额:
$ 383.32万 - 项目类别:
UniProt - Protein sequence and function embeddings for AI/Machine Learning readiness
UniProt - 用于人工智能/机器学习准备的蛋白质序列和功能嵌入
- 批准号:
10594115 - 财政年份:2014
- 资助金额:
$ 383.32万 - 项目类别:
UniProt: A centralized protein sequence and function resource
UniProt:集中的蛋白质序列和功能资源
- 批准号:
9276092 - 财政年份:2014
- 资助金额:
$ 383.32万 - 项目类别:
UniProt: A Protein Sequence and Function Resource for Biomedical Science
UniProt:生物医学的蛋白质序列和功能资源
- 批准号:
10490361 - 财政年份:2014
- 资助金额:
$ 383.32万 - 项目类别:
UniProt: A centralized protein sequence and function resource
UniProt:集中的蛋白质序列和功能资源
- 批准号:
10372430 - 财政年份:2014
- 资助金额:
$ 383.32万 - 项目类别:
UniProt building community metrics for FAIR and TRUSTworthy resources
UniProt 为公平和值得信赖的资源构建社区指标
- 批准号:
10595850 - 财政年份:2014
- 资助金额:
$ 383.32万 - 项目类别:
相似海外基金
Cerebral infarction treatment strategy using collagen-like "triple helix peptide" containing functional amino acid sequence
含功能氨基酸序列的类胶原“三螺旋肽”治疗脑梗塞策略
- 批准号:
23K06972 - 财政年份:2023
- 资助金额:
$ 383.32万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Establishment of a screening method for functional microproteins independent of amino acid sequence conservation
不依赖氨基酸序列保守性的功能性微生物蛋白筛选方法的建立
- 批准号:
23KJ0939 - 财政年份:2023
- 资助金额:
$ 383.32万 - 项目类别:
Grant-in-Aid for JSPS Fellows
Effects of amino acid sequence and lipids on the structure and self-association of transmembrane helices
氨基酸序列和脂质对跨膜螺旋结构和自缔合的影响
- 批准号:
19K07013 - 财政年份:2019
- 资助金额:
$ 383.32万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Construction of electron-transfer amino acid sequence probe with an interaction for protein and cell
蛋白质与细胞相互作用的电子转移氨基酸序列探针的构建
- 批准号:
16K05820 - 财政年份:2016
- 资助金额:
$ 383.32万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Development of artificial antibody of anti-bitter taste receptor using random amino acid sequence library
利用随机氨基酸序列库开发抗苦味受体人工抗体
- 批准号:
16K08426 - 财政年份:2016
- 资助金额:
$ 383.32万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
The aa15-17 amino acid sequence in the terminal protein domain of HBV polymerase as a viral factor affect-ing in vivo as well as in vitro replication activity of the virus.
HBV聚合酶末端蛋白结构域中的aa15-17氨基酸序列作为影响病毒体内和体外复制活性的病毒因子。
- 批准号:
25461010 - 财政年份:2013
- 资助金额:
$ 383.32万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Amino acid sequence analysis of fossil proteins using mass spectrometry
使用质谱法分析化石蛋白质的氨基酸序列
- 批准号:
23654177 - 财政年份:2011
- 资助金额:
$ 383.32万 - 项目类别:
Grant-in-Aid for Challenging Exploratory Research
Precise hybrid synthesis of glycoprotein through amino acid sequence-specific introduction of oligosaccharide followed by enzymatic transglycosylation reaction
通过氨基酸序列特异性引入寡糖,然后进行酶促糖基转移反应,精确杂合合成糖蛋白
- 批准号:
22550105 - 财政年份:2010
- 资助金额:
$ 383.32万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Estimating selection on amino-acid sequence polymorphisms in Drosophila
果蝇氨基酸序列多态性选择的估计
- 批准号:
NE/D00232X/1 - 财政年份:2006
- 资助金额:
$ 383.32万 - 项目类别:
Research Grant
Construction of a neural network for detecting novel domains from amino acid sequence information only
构建仅从氨基酸序列信息检测新结构域的神经网络
- 批准号:
16500189 - 财政年份:2004
- 资助金额:
$ 383.32万 - 项目类别:
Grant-in-Aid for Scientific Research (C)