权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Crowd-Assisted Deep Learning (CrADLe) Digital Curation to Translate Big Data into Precision Medicine

群体辅助深度学习 (CrADLe) 数字管理将大数据转化为精准医学

基本信息

批准号：
9403171
负责人：
Dexter D Hadley
金额：
$ 54.81万
依托单位：
UNIVERSITY OF CALIFORNIA, SAN FRANCISCO
依托单位国家：
美国
项目类别：
财政年份：
2017
资助国家：
美国
起止时间：
2017-08-01 至 2021-07-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/9403171
关键词：
Algorithms Alzheimer&apos s Disease Animal Model Artificial Intelligence Big Data Big Data to Knowledge Biological Biological Assay Categories Cell Line Cell model Classification Clinical Collaborations Communities Controlled Vocabulary Crowding Data Data Quality Data Set Defect Deposition Diagnosis Disease Drug Modelings E-learning Effectiveness Engineering Funding Funding Agency Future Gene Expression Gene Targeting Genomics Human Image Intelligence Label Learning Link Logic Machine Learning Malignant Neoplasms Maps Measures Medical Medicine Meta-Analysis Metadata Methods Modeling Molecular Molecular Profiling National Research Council Natural Language Processing Ontology Pathway interactions Patients Pattern Peer Review Performance Pharmaceutical Preparations Physicians Problem Solving PubMed Public Domains Publications Resources Sampling Scientific Inquiry Scientist Source Specific qualifier value Speed Subject Headings Text The Cancer Genome Atlas Training Translating United States National Institutes of Health Validation Work base big biomedical data biomarker discovery burden of illness cell type classical conditioning computer program crowdsourcing digital disease phenotype experimental study genomic data human disease improved knockout gene novel therapeutics open data potential biomarker precision medicine programs repository specific biomarkers

项目摘要

PROJECT SUMMARY/ABSTRACT The NIH and other agencies are funding high-throughput genomics (‘omics) experiments that deposit digital samples of data into the public domain at breakneck speeds. This high-quality data measures the ‘omics of diseases, drugs, cell lines, model organisms, etc. across the complete gamut of experimental factors and conditions. The importance of these digital samples of data is further illustrated in linked peer-reviewed publications that demonstrate its scientific value. However, meta-data for digital samples is recorded as free text without biocuration necessary for in-depth downstream scientific inquiry. Deep learning is revolutionary machine intelligence paradigm that allows for an algorithm to program itself thereby removing the need to explicitly specify rules or logic. Whereas physicians / scientists once needed to first understand a problem to program computers to solve it, deep learning algorithms optimally tune themselves to solve problems. Given enough example data to train on, deep learning machine intelligence outperform humans on a variety of tasks. Today, deep learning is state-of-the-art performance for image classification, and, most importantly for this proposal, for natural language processing. This proposal is about engineering Crowd Assisted Deep Learning (CrADLe) machine intelligence to rapidly scale the digital curation of public digital samples. We will first use our NIH BD2K-funded Search Tag Analyze Resource for Gene Expression Omnibus (STARGEO.org) to crowd-source human annotation of open digital samples. We will then develop and train deep learning algorithms for STARGEO digital curation based on learning the associated free text meta-data each digital sample. Given the ongoing deluge of biomedical data in the public domain, CrADLe may perhaps be the only way to scale the digital curation towards a precision medicine ideal. Finally, we will demonstrate the biological utility to leverage CrADLe for digital curation with two large- scale and independent molecular datasets in: 1) The Cancer Genome Atlas (TCGA), and 2) The Accelerating Medicines Partnership-Alzheimer’s Disease (AMP-AD). We posit that CrADLe digital curation of open samples will augment these two distinct disease projects with a host big data to fuel the discovery of potential biomarker and gene targets. Therefore, successful funding and completion of this work may greatly reduce the burden of disease on patients by enhancing the efficiency and effectiveness of digital curation for biomedical big data.

项目总结/摘要美国国立卫生研究院和其他机构正在资助高通量基因组学实验存款以极快的速度将数据的数字样本转移到公共领域。这些高质量的数据衡量了疾病、药物、细胞系、模式生物等的组学，涵盖实验因素的全部范围和条件这些数据的数字样本的重要性进一步说明了链接同行评审出版物证明其科学价值。然而，数字样本的元数据被记录为免费的没有深入的下游科学探究所需的biocuration文本。深度学习是一种革命性的机器智能范式，它允许算法进行编程它本身，从而消除了明确指定规则或逻辑的需要。而医生/科学家一旦需要首先理解一个问题，然后编程计算机来解决它，深度学习算法可以优化调整自己解决问题。给定足够的示例数据进行训练，深度学习机器智能在很多任务上都胜过人类今天，深度学习是图像的最先进表现，分类，最重要的是，对于这个提案，对于自然语言处理。该提案是关于工程人群辅助深度学习（CrADLe）机器智能，快速扩展公共数字样本的数字管理。我们将首先使用NIH BD 2K资助的搜索标签分析基因表达综合资源（STARGEO.org），以众包开放的人类注释数字样本然后，我们将开发和训练基于STARGEO数字策展的深度学习算法在学习每个数字样本的相关自由文本元数据时。考虑到生物医学的不断泛滥公共领域的数据，CrADLe可能是将数字策展扩展到精准医疗的理想选择最后，我们将展示利用CrADLe进行数字策展的生物效用，其中有两个大型- 规模和独立的分子数据集：1）癌症基因组图谱（TCGA），和2）加速药物合作伙伴-阿尔茨海默病（AMP-AD）。我们认为CrADLe开放样本的数字化管理将用大量的大数据来增强这两个不同的疾病项目，以促进潜在生物标志物的发现。和基因靶点。因此，成功资助和完成这项工作可大大减轻通过提高生物医学大数据数字化管理的效率和有效性，