权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

PEBank: A database for protein engineering data

PEBank：蛋白质工程数据数据库

基本信息

批准号：
9141872
负责人：
Barry D Olafson
金额：
$ 21.93万
依托单位：
PROTABIT, LLC
依托单位国家：
美国
项目类别：
财政年份：
2016
资助国家：
美国
起止时间：
2016-06-01 至 2016-11-30
项目状态：
已结题

项目摘要

DESCRIPTION (provided by applicant): Engineered proteins such as therapeutic antibodies, specialized enzymes for drug manufacturing, and proteins used to identify new small molecule drugs are making significant contributions to improve health care. Protein therapeutics alone represent a $100+ billion market that is rapidly growing and has broad applications in the treatment of cancer, metabolic diseases, and other disorders. These advances have been made possible, in part, by the free and easy access to data in the form of nucleotide sequences (GenBank) and protein structures (Protein Data Bank, PDB). Both of these databases have grown exponentially and continue to organize and structure data in a manner that would be hard for individual groups or companies to maintain on their own. A new type of data is emerging in the protein engineering community that is not stored in GenBank or the PDB-engineered protein sequences and their associated experimental assay data. The protein engineering community is at a relatively early stage of development compared to the sequence or structure determination communities. Thus, the time is ripe to develop a database to organize the data from protein engineering studies into a cohesive and comprehensive dataset. We will call this database PEBank. In Phase I, PEBank development will include: (1) drafting a specification for Version 1.0, with feedback from representatives from GenBank and the PDB, that describes the types of data to be stored and lays out the organizational hierarchy of the data; (2) implementing a prototype of Version 1.0 of PEBank and garnering feedback from the protein engineering community; (3) implementing a cloud-based version of PEBank; and (4) creating web-based utilities for depositing, viewing, and analyzing data. In Phase II, we will continue development of PEBank by: (1) creating a version that will allow write privileges and hosting it on Amazon Web Services; (2) providing support for PEBank users; (3) developing a secure limited-access version of PEBank that will hold customer-specific proprietary data; (4) developing tools that will validate the intregrity of the data and policies to handle invalid data; (5) developing web-enabled search tools to extract data from PEBank; (6) testing data deposit and viewing, and making PEBank available to the academic community; and (7) developing advanced analysis tools for finding statistical correlations between various data elements. We will also begin to use the analysis tools and PEBank data to optimize the predictive capability of our computational protein design software; this will include improving the underlying score functions and developing dynamic design tools that integrate database interrogation with the sequence optimization process. When complete, PEBank will allow protein engineers around the world to access protein engineering data in a standard format that can be easily accessed, searched, and shared; this data can be used to inform their designs and to develop more predictive protein design tools, thus accelerating the development of new and improved proteins for therapeutic, diagnostic, and other health-related applications.

描述（由申请人提供）：工程蛋白质，如治疗性抗体，用于药物制造的专用酶，以及用于鉴定新的小分子药物的蛋白质，正在为改善医疗保健做出重大贡献。仅蛋白质疗法就代表了一个1000多亿美元的市场，该市场正在迅速增长，并在癌症，代谢疾病和其他疾病的治疗中具有广泛的应用。这些进展之所以能够取得，部分是因为可以免费和方便地获得核苷酸序列（GenBank）和蛋白质结构（蛋白质数据库，PDB）形式的数据。这两个数据库都呈指数级增长，并继续以单个团体或公司难以自行维护的方式组织和构建数据。在蛋白质工程社区中出现了一种新类型的数据，这些数据没有存储在GenBank或PDB工程蛋白质序列及其相关的实验测定数据中。与序列或结构确定社区相比，蛋白质工程社区处于相对早期的发展阶段。因此，开发一个数据库来将蛋白质工程研究的数据组织成一个有凝聚力的综合数据集的时机已经成熟。我们将这个数据库称为PEBank。在第一阶段，PEBank的开发将包括：（1）起草1.0版的规范，其中包括来自GenBank和PDB代表的反馈，该规范描述了要存储的数据类型，并列出了数据的组织层次结构;（2）实现PEBank 1.0版的原型，并收集来自蛋白质工程社区的反馈;（3）实现基于云的PEBank版本;（4）创建基于Web的实用程序，用于存储，查看和分析数据。在第二阶段，我们将继续发展 PEBank由：（1）创建一个允许写入权限的版本，并将其托管在Amazon Web Services上;（2）为PEBank用户提供支持;（3）开发一个安全的PEBank有限访问版本，该版本将保存客户特定的专有数据;（4）开发工具，验证数据的完整性和处理无效数据的策略;（5）开发基于Web的从PEBank中提取数据的搜索工具;（6）测试数据存款和查看，并使PEBank可供学术界使用;（7）开发高级分析工具，以找到各种数据元素之间的统计相关性。我们还将开始使用分析工具和PEBank数据来优化我们的计算蛋白质设计软件的预测能力;这将包括改进基础评分函数和开发将数据库查询与序列优化过程相结合的动态设计工具。完成后，PEBank将允许世界各地的蛋白质工程师以标准格式访问蛋白质工程数据，这些数据可以轻松访问，搜索和共享;这些数据可以用于为他们的设计提供信息，并开发更具预测性的蛋白质设计工具，从而加速开发用于治疗，诊断和其他健康相关应用的新的和改进的蛋白质。