权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Unifying Templates, Ontologies and Tools to Achieve Effective Annotation of Bioassay Protocols

统一模板、本体和工具以实现生物测定协议的有效注释

基本信息

批准号：
9398728
负责人：
BARRY A BUNIN
金额：
$ 54.64万
依托单位：
UNIVERSITY OF MIAMI SCHOOL OF MEDICINE
依托单位国家：
美国
项目类别：
财政年份：
2017
资助国家：
美国
起止时间：
2017-08-01 至 2021-07-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/9398728
关键词：
Academia Address Adopted Adoption Area Big Data Biological Assay Biomedical Research Chemicals Communication Communities Competence Complex Computer software Computers Controlled Vocabulary Custom Data Data Set Data Storage and Retrieval Development Ecosystem Effectiveness Elements Ensure Exercise FAIR principles Feedback Foundations Hour Journals Learning Librarians Machine Learning Manuals Maps Metadata Methods Ontology Output Participant Pharmaceutical Preparations Polishes Problem Solving Process Property Protocols documentation PubChem Publishing Readability Research Research Personnel Retrieval Risk Science Scientist Semantics Site Software Engineering Software Tools Specialist Specific qualifier value Standardization Structure Suggestion System Technology Testing Text Time Translating Tweens Update Vocabulary Work base cost effective data modeling design drug discovery drug mechanism experience experimental study improved improved functioning in vivo informatics training novel open source practical application predictive modeling repository tool user-friendly

项目摘要

Project Summary Biological assays are the foundation for developing chemical probes and drugs, but new Big Data approaches – which have revolutionized other areas of biomedical science – have not yet advanced this early step of biomedical research: analysis of assay data. The obstacle is that scientists specify their assays through text descriptions written in scientific English, which need to be translated into standardized annotations readable by computers. This lack of standardized and machine-readable assay descriptions is a major impediment to manage, find, aggregate, compare, re-use, and learn from the ever-growing corpus of assays (e.g., >1.2 million in PubChem). Thus, there is a critical need for better annotation and curation tools for drug discovery assays. However, the process to go from a simple text protocol to highly detailed machine-readable semantic annotations is not trivial. Multiple tools and technologies are required: ontologies or the structured controlled vocabularies; templates that map specific vocabularies to properties that are to be captured; and software tools to actually apply these ontologies to a given text. Currently, each of these exists in isolation; yet, a bottleneck in any one tool or technology, or a gap between the different pieces, disrupts the overall process, resulting in poor or no annotation of the datasets. Here we propose a project to combine and integrate these three technologies (which are also the core competencies of the three groups collaborating on this proposal). We will deliver a novel, comprehensive, user-friendly data annotation and curation system that is highly interconnected, encompassing the full cycle, and real-world practice, of required tasks and decisions, by all parties within the `bioassay annotation ecosystem' (researchers performing curation, dedicated curators, IT specialists, ontology owners, and librarians/repositories). The alliance between academic and commercial collaborators, who already work together, will greatly benefit the project and minimize execution risk. Our specific aims are to: (1) Develop a bioassay-specific template editor and templates by adopting the Stanford (Center for Expanded Data Annotation and Retrieval, CEDAR) data model to the machine learning-based curation tool BioAssay Express, to exploit the broad functionality of its data structures, tools and interfaces; (2) Define and create an ontology update process and tool (`OntoloBridge') to support rapid feedback between curators/users and ontology experts and enable semi-automated incorporation of suggestions for updates to existing published ontologies; (3) Develop new tools to export annotated data into public repositories such as PubChem; and (4) Evaluate our solution across diverse audiences (pharma, academia, repositories). The system will improve bioassay curation efficiency, quality, and effectiveness, enabling scientists to generate standardized annotations for their experiments to make these data FAIR (Findable, Accessible, Interoperable, Reusable). We envision this suite of tools will encourage annotation earlier in the data lifecycle while still supporting annotation at later stages (e.g., submission to repositories or to journals).

项目概要生物测定是开发化学探针和药物的基础，但新的大数据方法 – 彻底改变了生物医学科学的其他领域 – 尚未推进这一早期步骤生物医学研究：分析数据。障碍在于科学家通过文本指定他们的分析方法用科学英语编写的描述，需要翻译成可读的标准化注释电脑。缺乏标准化和机器可读的分析描述是一个主要障碍管理、查找、聚合、比较、重用并从不断增长的检测语料库中学习（例如，>1.2 PubChem 百万）。因此，迫切需要更好的药物发现注释和管理工具化验。然而，从简单的文本协议到高度详细的机器可读语义的过程注释并不是微不足道的。需要多种工具和技术：本体论或结构化控制词汇；将特定词汇表映射到要捕获的属性的模板；和软件工具将这些本体论实际应用到给定的文本中。目前，这些因素中的每一个都是孤立存在的。却又遇到瓶颈任何一种工具或技术，或者不同部分之间的差距，都会扰乱整个过程，导致数据集的注释很差或没有。在这里我们提出一个项目来结合和整合这三个技术（这也是协作此提案的三个小组的核心能力）。我们将提供一个新颖、全面、用户友好的数据注释和管理系统，该系统高度相互关联，涵盖所有所需任务和决策的整个周期和现实世界实践 “生物测定注释生态系统”内的各方（进行管理的研究人员、专门的管理人员、IT 专家、本体所有者和图书馆员/存储库）。学术与商业的联盟已经一起工作的合作者将使项目受益匪浅，并将执行风险降至最低。我们的具体目标是：（1）开发生物测定专用模板编辑器，并采用Stanford （扩展数据注释和检索中心，CEDAR）以机器学习为基础的数据模型管理工具 BioAssay Express，利用其数据结构、工具和界面的广泛功能； (2) 定义并创建本体更新流程和工具（“OntoloBridge”）以支持之间的快速反馈策展人/用户和本体专家，并能够半自动地合并更新建议现有已发布的本体； (3) 开发新工具将注释数据导出到公共存储库，例如公共化学； (4) 在不同受众（制药界、学术界、存储库）中评估我们的解决方案。这系统将提高生物测定的效率、质量和有效性，使科学家能够产生为他们的实验标准化注释，使这些数据公平（Findable、Accessible、Interoperable、可重复使用的）。我们预计这套工具将鼓励在数据生命周期的早期进行注释，同时仍然支持后期注释（例如，提交到存储库或期刊）。