权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

UniProt: A Protein Sequence and Function Resource for Biomedical Science

UniProt：生物医学的蛋白质序列和功能资源

基本信息

批准号：
10267787
负责人：
Alex Bateman
金额：
$ 383.32万
依托单位：
EUROPEAN MOLECULAR BIOLOGY LABORATORY
依托单位国家：
美国
项目类别：
财政年份：
2014
资助国家：
美国
起止时间：
2014-09-18 至 2026-05-31
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/10267787
关键词：
Affect Amino Acid Sequence Artificial Intelligence Biomedical Research Catalogs Cells Collaborations Communities Complement Complex Cues Data Data Set Development Disease Disease susceptibility Distance Learning Ensure Environment FAIR principles Genomics Genotype Glean Gold Growth Health Hereditary Disease Human Human Genetics Human Microbiome Individual International Internet Knowledge Knowledge Extraction Literature Machine Learning Methods Modeling Modernization Molecular Molecular Biology Molecular Sequence Data Molecular Structure Ontology Organ Orthologous Gene Outcome Paper Pathway interactions Pattern Pharmaceutical Preparations Phenotype Play Process Production Protein Array Proteins Publications Readability Readiness Research Research Personnel Resources Role Science Shapes Site Standardization Structural Protein System Technology Time Tissues Training Triage Variant Work biomedical data science biomedical resource crowdsourcing data access data reuse deep learning design experience experimental study formycin triphosphate genetic architecture genomic variation hackathon human disease improved innovation knowledge base learning strategy machine learning method macromolecular assembly meetings new technology pathogen personalized diagnostics prognostic protein function response social media symposium text searching web site webinar

项目摘要

PROJECT SUMMARY/ABSTRACT This project continues the development of the UniProt Knowledgebase, which aims to provide the scientific community with a comprehensive, high-quality, and freely accessible resource of protein sequences and functional information. Proteins are an essential bridge between human genetics, the environment and phenotype. While human genetics has increasing power to find correlations between genotype and phenotype, knowledge of how proteins function, provided by UniProt, is essential for the mechanistic understanding critical to develop health outcomes through improved and personalized diagnostics, prognostics, and treatments. Biomedical research is being revolutionized by methods from the field of Artificial Intelligence, particularly Machine Learning (ML) approaches such as Deep Learning (DL). These approaches now outstrip the ability of humans in many fields and are state-of-the-art when sufficient data is available. UniProt provides gold standard training data for hundreds of ML applications in biomedical research. The work in this proposal will enhance the readiness of UniProt for use in ML and will integrate ML methods to enhance our efficiency. UniProt curators extract and synthesize experimental knowledge of proteins from papers in human and machine- readable forms using a range of standard ontologies. This proposal will further structure protein knowledge in UniProt, developing complete, machine-readable catalogs of the functional impact of human variation and of human protein networks and complexes, essential to understanding human disease. Efficiency of curation will be improved using DL models, developed in collaboration with text mining experts, to automate the identification of relevant papers and accelerate extraction of knowledge. This extracted knowledge will be validated by our expert curators and also the wider research community who will be actively engaged to further scale curation. ML approaches will also be used to infer annotations for proteins with no experimental characterization, using community challenges to develop faster, more accurate, scalable approaches to annotate the deluge of uncharacterized proteins. UniProt is an exemplar FAIR resource and has served the scientific community with metronomic data releases despite an exponential growth in data volumes. Streamlined production processes will scale efficiently and sustainably with both the growing data volume and complexity. We will explore novel technologies to ensure the continued timely release of data to the community according to the FAIR principles. UniProt is an international hub of protein data that serves hundreds of thousands of users annually. We will continue using user-centric approaches to develop the UniProt website in response to user needs and new data types. We will engage with our stakeholders and collaborators by introducing an annual strategic partnership meeting. We will engage our communities through webinars, social media, hackathons and attendance at scientific meetings to broaden the efficient and impactful use of our data.

项目总结/摘要该项目继续开发UniProt知识库，旨在提供科学的社区提供全面，高质量和免费获得的蛋白质序列资源，功能信息。蛋白质是人类遗传学、环境和表型虽然人类遗传学越来越有能力发现基因型和表型之间的相关性，由UniProt提供的蛋白质功能的知识对于理解关键的通过改进和个性化的诊断、诊断和治疗来改善健康状况。人工智能领域的方法正在使生物医学研究发生革命性的变化，机器学习（ML）方法，如深度学习（DL）。这些方法现在已经超出了人类在许多领域，是国家的最先进的，当有足够的数据。UniProt提供黄金标准为生物医学研究中的数百个ML应用提供训练数据。本提案中的工作将加强 UniProt已经准备好用于ML，并将整合ML方法以提高我们的效率。 UniProt策展人从人类和机器的论文中提取和合成蛋白质的实验知识- 使用一系列标准本体的可读形式。这一建议将进一步结构蛋白质的知识， UniProt，开发完整的，机器可读的人类变异的功能影响目录，人类蛋白质网络和复合物，对了解人类疾病至关重要。管理效率将使用与文本挖掘专家合作开发的DL模型进行改进，以自动识别相关论文，加速知识提取。这些提取的知识将由我们的专家策展人和更广泛的研究社区将积极参与进一步扩大策展规模。 ML方法还将用于推断没有实验表征的蛋白质的注释，社区面临的挑战是开发更快，更准确，可扩展的方法来注释洪水，未知蛋白质 UniProt是一个典型的FAIR资源，并为科学界提供节拍数据发布服务尽管数据量呈指数增长。简化的生产流程将有效扩展，随着数据量和复杂性的不断增长，我们将探索新技术，根据公平原则，继续及时向社会发布数据。 UniProt是一个国际蛋白质数据中心，每年为数十万用户提供服务。我们将继续采用以用户为中心的方法，根据用户需求和新数据开发UniProt网站类型我们将通过引入年度战略合作伙伴关系，与利益相关者和合作者互动会议我们将通过网络研讨会、社交媒体、黑客马拉松和参加科学会议，以扩大我们的数据的有效和有影响力的使用。