权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Scalable Biomedical Pattern Recognition Via Deep Learning

通过深度学习进行可扩展的生物医学模式识别

基本信息

批准号：
8689173
负责人：
Thomas Lasko
金额：
$ 20.27万
依托单位：
VANDERBILT UNIVERSITY
依托单位国家：
美国
项目类别：
财政年份：
2013
资助国家：
美国
起止时间：
2013-07-01 至 2016-04-29
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/8689173
关键词：
Acquired Immunodeficiency Syndrome Algorithms Architecture Area Biomedical Research Caring Classification Clinical Clinical Data Computational algorithm Computerized Medical Record Couples Data Data Set Data Sources Dependence Development Diabetes Mellitus Diagnosis Disease Engineering Evaluation Exhibits Feedback Goals Gold Healthcare Systems Human Image Individual Judgment Knowledge Label Laboratories Learning Literature Manuals Measures Medical Metformin Methods Modeling Myocardial Infarction Nature Non-Insulin-Dependent Diabetes Mellitus Outcome PTGS2 gene Pattern Pattern Recognition Performance Pharmaceutical Preparations Phenotype Pilot Projects Population Probability Process ROC Curve Records Research Risk Sampling Sea Serum Single Nucleotide Polymorphism Source Specific qualifier value Structure Testing Time To specify Uncertainty Uric Acid Work cell growth clinical care clinical phenotype density diabetic genetic association genetic variant inhibitor/antagonist interest meetings neoplastic cell non-diabetic outcome forecast prevent speech recognition success type I and type II diabetes

项目摘要

DESCRIPTION (provided by applicant): Patterns extracted from Electronic Medical Records (EMRs) and other biomedical datasets can provide valuable feedback to a learning healthcare system, but our ability to find them is limited by certain manual steps. The dominant approach to finding the patterns uses supervised learning, where a computational algorithm searches for patterns among input variables (or features) that model an outcome variable (or label). This usually requires an expert to specify the learning task, construct input features, and prepare the outcome labels. This workflow has served us well for decades, but the dependence on human effort prevents it from scaling and it misses the most informative patterns, which are almost by definition the ones that nobody anticipates. It is poorly suited to the emerging era of population-scale data, in which we can conceive of massive new undertakings such as surveiling for all emerging diseases, detecting all unanticipated medication effects, or inferring the complete clinical phenotype of all genetic variants. The approach of unsupervised feature learning overcomes these limitations by identifying meaningful patterns in massive, unlabeled datasets with little or no human involvement. While there is a large literature on feature creation, a new surge of interest in unsupervised methods is being driven by the recent development of deep learning, in which a compact hierarchy of expressive features is learned from large unlabeled datasets. In the domains of image and speech recognition, deep learning has produced features that meet or exceed (by as much as 70%) the previous state of the art on difficult standardized tasks. Unfortunately, the noisy, sparse, and irregular data typically found in an EMR is a poor substrate for deep learning. Our approach uses Gaussian process regression to convert such an irregular sequence of observations into a longitudinal probability density that is suitable for use with a deep architecture. With this approach, we can learn continuous unsupervised features that capture the longitudinal structure of sparse and irregular observations. In our preliminary results unsupervised features were as powerful (0.96 AUC) in an unanticipated classification task as gold-standard features engineered by an expert with full knowledge of the domain, the classification task, and the class labels. In this project we will learn unsupervised features for records of all individuals in our deidentifed EMR image, for each of 100 laboratory tests and 200 medications of relevance to type 1 or type 2 diabetes. We will evaluate the features using three pattern recognition tasks that were unknown to the feature-learning algorithm: 1) an easy supervised classification task of distinguishing diabetics vs. nondiabetics, 2) a much more difficult task of distinguishing type 1 vs. type 2 diabetics, and 3) a genetic association task that considers the features as micro-phenotypes and measures their association with 29 different single nucleotide polymorphisms with known associations to type 1 or type 2 diabetes.

描述（由申请人提供）：从电子病历（EMR）和其他生物医学数据集提取的模式可以为学习医疗保健系统提供有价值的反馈，但我们发现它们的能力受到某些手动步骤的限制。寻找模式的主要方法是使用监督学习，其中计算算法在输入变量（或特征）中搜索模式，这些输入变量（或特征）对结果变量（或标签）进行建模。这通常需要专家指定学习任务，构建输入特征，并准备结果标签。几十年来，这个工作流程一直很好地为我们服务，但对人类努力的依赖阻止了它的扩展，它错过了最具信息性的模式，这些模式几乎是没有人预料到的。它不太适合人口规模数据的新兴时代，在这个时代，我们可以设想大规模的新任务，例如监测所有新出现的疾病，检测所有未预料到的药物效应，或推断所有遗传变异的完整临床表型。无监督特征学习的方法克服了这些局限性，通过识别大量无标签数据集中有意义的模式，很少或没有人类参与。虽然有大量关于特征创建的文献，但对无监督方法的新兴趣激增，这是由深度学习的最新发展驱动的，其中表达特征的紧凑层次结构是从大型未标记数据集中学习的。在图像和语音识别领域，深度学习产生的功能在困难的标准化任务上达到或超过（高达70%）以前的最新技术水平。不幸的是，EMR中通常存在的噪声，稀疏和不规则数据是深度学习的不良基础。我们的方法使用高斯过程回归将这种不规则的观测序列转换为适合用于深度架构的纵向概率密度。通过这种方法，我们可以学习连续的无监督特征，这些特征可以捕获稀疏和不规则观测的纵向结构。在我们的初步结果中，无监督特征在意料之外的分类任务中与由对领域、分类任务和类标签有充分了解的专家设计的金标准特征一样强大（0.96 AUC）。在这个项目中，我们将学习我们的deidentified EMR图像中所有个人记录的无监督特征，用于与1型或2型糖尿病相关的100项实验室检查和200种药物。我们将使用特征学习算法未知的三个模式识别任务来评估特征：1）区分糖尿病患者与非糖尿病患者的容易的监督分类任务，2）区分1型糖尿病患者与2型糖尿病患者的困难得多的任务，以及3）将特征视为微观的遗传关联任务，表型，并测量其与已知与1型或2型糖尿病相关的29种不同单核苷酸多态性的相关性。