权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

An Imputation-Consistency Algorithm for Biomedical Complex Data Analysis

生物医学复杂数据分析的插补一致性算法

基本信息

批准号：
9658022
负责人：
FAMING LIANG
金额：
$ 30.33万
依托单位：
PURDUE UNIVERSITY
依托单位国家：
美国
项目类别：
财政年份：
2018
资助国家：
美国
起止时间：
2018-01-01 至 2021-12-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/9658022
关键词：
Address Algorithms Antineoplastic Agents Big Data Biological Markers Biomedical Research Cations Clinic Clinical Clinical Trials Complex Cox Models Data Data Analyses Data Collection Development Diagnosis Digit structure Dimensions Disease Disease Management Drug-sensitive FDA approved Failure Future Genes Genetic Heterogeneity Genetic Models Health Health Care Costs Health Services Health system Heterogeneity Insulin-Dependent Diabetes Mellitus Knowledge Malignant Neoplasms Mathematics Messenger RNA Methods Modeling Molecular Patient Care Pattern Performance Play Prevention Property Public Health Recipe Role Sample Size Scientist Statistical Algorithm Statistical Methods Technology Testing The Cancer Genome Atlas Variant Work anticancer research biomarker discovery cancer type circulating biomarkers conditioning data acquisition disorder risk expectation high dimensionality improved insight outcome forecast personalized medicine precision medicine predictive modeling prognostic tool

项目摘要

Project Summary The dramatic improvement in data collection and acquisition technologies in the past decades has enabled sci- entists to collect vast amounts of health-related data from biomedical studies. If analyzed properly, these data will expand our knowledge for testing new hypotheses about disease management from diagnosis to prevention to per- sonalized treatment. However, the biomedical data can be rather complex, how to analyze them has posed many challenges on the existing methods. This proposal attempts to address three fundamental challenges: (i) Missing data are ubiquitous in biomedical research, how to make a suﬃcient use of biomedical complex data in presence of missing values? (ii) With the growing data size, typically comes a growing complexity of the patterns in the data and of the models needed to account for the patterns. What is the general recipe for estimating parameters of complex models? (iii) Biomarker identi cation from high-throughput omics data has been one of major focuses in cancer research. Yet despite intense eﬀort, the number of biomarkers approved by FDA each year for clinical use is still in single digits. An important factor contributing to this failure is the lack of appropriate statistical methods for analyzing such heterogeneous and high-dimensional data. Toward a suﬃcient use of biomedical complex data, this project proposes an imputation-consistency algorithm as a general algorithm for high-dimensional missing data problems. Then the algorithm is extended to address other two challenges under the principles of conditioning and consistency; in particular, this project proposes some highly eﬃcient and eﬀective statistical algorithms that address the heterogeneity and high-dimensionality issues encountered in biomarker identi cations and eQTL analysis. The proposed algorithms are applied to (i) select anticancer drug sensitive genes with the CCLE and SANGER data, (ii) identify prognostic mRNA biomarkers for multiple types of cancers using the TCGA data, (iii) conduct eQTL analysis for multiple types of cancers using the TCGA data, and (iv) identify informative circulating biomarkers for type 1 diabetes. The proposed methods are highly eﬃcient and general and can be applied to other types of disease as well. Statistically, this project is to develop some general, eﬀective, and highly eﬃcient algorithms for complex data analysis; biomedically, this project will signi cantly improve accuracy of biomarker identi cation from omics data, which advances people's understanding of molecular mechanism and development of precision medicine. 1

项目摘要在过去的几十年里，数据收集和获取技术的巨大进步使科学研究成为可能。从生物医学研究中收集大量与健康相关的数据。如果分析得当，这些数据将扩大我们的知识，以测试新的假设，从诊断到预防疾病的管理，以每，超声波治疗然而，生物医学数据可以是相当复杂的，如何分析它们提出了许多对现有方法的挑战。本提案试图解决三个基本挑战：数据在生物医学研究中无处不在，如何充分利用现有的生物医学复杂数据，缺失值？(ii)随着数据大小的增长，数据和模型需要考虑的模式。什么是一般配方估计参数的复杂的模型？(iii)从高通量组学数据中识别生物标志物一直是生物学领域的主要焦点之一。癌症研究。然而，尽管有大量的研究，FDA每年批准用于临床的生物标志物的数量仍然很少。还是个位数造成这种失败的一个重要因素是缺乏适当的统计方法用于分析这样的异构和高维数据。为了充分利用生物医学复杂数据，本计画提出一种估算一致性演算法，作为高维缺失资料的一般演算法问题然后，在条件反射原理下，将该算法扩展到解决其他两个挑战，一致性;特别是，该项目提出了一些高度有效的统计算法，生物标志物鉴定和eQTL分析中遇到的异质性和高维性问题。的所提出的算法被应用于（i）利用CCLE和桑格数据选择抗癌药物敏感基因， (ii)使用TCGA数据鉴定多种类型癌症的预后mRNA生物标志物，（iii）进行eQTL 使用TCGA数据分析多种类型的癌症，和（iv）鉴定用于以下的信息性循环生物标志物： 1型糖尿病所提出的方法具有较高的效率和通用性，可应用于其他类型的疾病也从统计学的角度来看，本项目的目标是开发一些通用的、有效的、高效率的算法，数据分析;生物医学上，该项目将显著提高组学生物标志物鉴定的准确性数据，推进了人们对分子机制的理解和精准医学的发展。 1