权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

On Statistical Modeling and Parameter Estimation for High Dimensional Systems

高维系统的统计建模和参数估计

基本信息

批准号：
1612924
负责人：
Faming Liang
金额：
$ 15万
依托单位：
University of Florida
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2016
资助国家：
美国
起止时间：
2016-09-01 至 2018-01-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1612924&HistoricalAwards=false
关键词：
Statistical Modeling Parameter Estimation Dimensional

项目摘要

The dramatic improvements in data collection and acquisition technologies over the last decades have enabled scientists to collect massive amounts of high-dimensional data that allow for monitoring and studying of complex systems. Due to their intrinsic nature, many of the high-dimensional datasets, such as omics and genome-wide association study (GWAS) data, have a much smaller sample size compared to the dimension (referred to as the small-n-large-P problem). Current research on statistical modeling of small-n-large-P data focuses on linear and generalized linear models. However, these approaches are often not adequate for modeling complex systems, and estimation of the model parameters is challenging. This project addresses two fundamental problems, statistical modeling and parameter estimation, toward a valid statistical analysis of high-dimensional data. Successful completion of this project will generate hands-on tools for statistical inference of high-dimensional complex systems, which can benefit researchers in many areas of science and technology. In particular, the proposed applications to biomedical studies will lead to accurate tools for detecting biomarkers associated with disease processes and tailoring optimal therapy for individual patients with complex diseases. The research results will be disseminated to the statistical and biomedical communities, via collaboration, conference presentations, books, and articles to be published in academic journals. The project will also have significant impact on education through the involvement of graduate students in the project, and incorporation of results into undergraduate and graduate courses. In addition, the R package developed under this project will provide a valuable tool for statistical analysis of high-dimensional data.The current approach to modeling small-n-large-P data focuses on linear and generalized linear models, and casts the problem as variable selection by imposing a sparsity constraint on parameter values. Although these models have many advantages, such as simplicity and computational efficiency, estimation of the parameters is still a challenging problem. While regularization is often used in these situations, it can perform poorly when the sample size is small and the variables are highly correlated. Two new methods are proposed to address these concerns, namely, Bayesian neural network (BNN) and blockwise coordinate consistency (BCC). The BNN method works by first fitting the data with a feed-forward neural network, conducting variable selection through network structure selection under a Bayesian framework, and resolving the associated computational difficulty via parallel computing. Compared to existing methods, BNN can lead to much more precise selection of relevant variables and outcome prediction for high-dimensional nonlinear systems. The BCC method works by maximizing a new objective function, the expectation of the log-likelihood function, using a cyclic algorithm and iteratively finding consistent estimates for each block of parameters conditional on the current estimates of the other parameters. The BCC method reduces the high-dimensional parameter estimation problem to a series of low-dimensional parameter estimation problems. The preliminary results indicate that BCC can provide a drastic improvement in both parameter estimation and variable selection over regularization methods. The validity of the proposed methods will be rigorously studied and applied to biomarker discovery, precision medicine, and joint estimation of the regression coefficients and precision matrix for high-dimensional multivariate regression.

在过去的几十年里，数据收集和获取技术的巨大进步使科学家能够收集大量的高维数据，从而可以监测和研究复杂的系统。由于其固有的性质，许多高维数据集，如组学和全基因组关联研究（GWAS）数据，与维度相比，样本量要小得多（称为小N大P问题）。目前小N大P数据统计建模的研究主要集中在线性和广义线性模型。然而，这些方法通常不足以对复杂系统进行建模，并且模型参数的估计具有挑战性。这个项目解决了两个基本问题，统计建模和参数估计，对一个有效的高维数据的统计分析。该项目的成功完成将产生用于高维复杂系统的统计推断的实用工具，这可以使许多科学和技术领域的研究人员受益。特别是，生物医学研究的拟议应用将导致准确的工具，用于检测与疾病过程相关的生物标志物，并为患有复杂疾病的个体患者定制最佳治疗。研究成果将通过合作、会议介绍、书籍和在学术期刊上发表的文章传播给统计和生物医学界。该项目还将通过让研究生参与该项目，并将成果纳入本科生和研究生课程，对教育产生重大影响。此外，本项目开发的R软件包将为高维数据的统计分析提供一个有价值的工具。目前对small-n-large-P数据建模的方法主要集中在线性和广义线性模型上，并通过对参数值施加稀疏约束来将问题转化为变量选择。虽然这些模型具有许多优点，如简单和计算效率，参数的估计仍然是一个具有挑战性的问题。虽然正则化经常用于这些情况，但当样本量很小并且变量高度相关时，它的性能可能很差。提出了两种新的方法来解决这些问题，即贝叶斯神经网络（BNN）和分块坐标一致性（BCC）。 BNN方法的工作原理是首先用前馈神经网络拟合数据，在贝叶斯框架下通过网络结构选择进行变量选择，并通过并行计算解决相关的计算困难。与现有的方法相比，BNN可以导致更精确的选择相关变量和结果预测的高维非线性系统。 BCC方法的工作原理是最大化一个新的目标函数，对数似然函数的期望值，使用循环算法和迭代地找到一致的估计为每个块的参数条件下的其他参数的当前估计。 BCC方法将高维参数估计问题简化为一系列低维参数估计问题。初步结果表明，BCC可以提供一个显着的改善，在参数估计和变量选择正则化方法。所提出的方法的有效性将被严格研究，并应用于生物标志物发现，精准医学，以及高维多元回归的回归系数和精度矩阵的联合估计。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Faming Liang其他文献

Bayesian phylogeny analysis via stochastic approximation Monte Carlo

DOI：
10.1016/j.ympev.2009.06.019
发表时间：
2009-11-01
期刊：
Research article
影响因子：
作者：
Sooyoung Cheon;Faming Liang
通讯作者：
Faming Liang

Networks Involved in Coronary Collateral Formation

参与冠状动脉侧支形成的网络

DOI：
发表时间：
期刊：
影响因子：
0
作者：
Jian Zhang;J. Regieli;M. Schipper;M. M. Entius;Faming Liang;J. Koerselman;H. J. Ruven;Yolanda van der Graaf;D. Grobbee;Pieter A. Doevendans;Pieter A. Doevendans
通讯作者：
Pieter A. Doevendans