CAREER: Sparse Modeling Driven by Large-Scale Genomic Data

职业:大规模基因组数据驱动的稀疏建模

基本信息

  • 批准号:
    1055286
  • 负责人:
  • 金额:
    $ 40万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2011
  • 资助国家:
    美国
  • 起止时间:
    2011-06-01 至 2017-05-31
  • 项目状态:
    已结题

项目摘要

Massive and high-dimensional data arise frequently in biology and genomics. Regularization and sparsity are critical components for modeling such data and extracting information. Broad success of sparse modeling methods, such as the Lasso, has encouraged fast development in this area. However, most existing methods were developed under the frameworks of linear models and generalized linear models. The complex structures in genomic data require further development beyond existing methods. To this end, proposed are three novel sparse modeling methods with sophisticated model structures driven by large-scale gene expression data, protein binding data and DNA sequence data. The first method, motivated by modeling the relationship between protein binding and gene expression, constructs linear regression models on the terminal nodes of a decision tree. The decision tree partitions the population into subgroups according to the predictors. Each subgroup has its own sparse linear regression model between the response and the predictors. Two types of regularization, one on the regression coefficients and the other on the size of the tree, are used to encourage sparsity. The second method concerns the construction of tight clusters for gene expression data by penalizing the difference in grouped parameters between two tight clusters and between a tight cluster and the null cluster. Block-wise coordinate descent in conjunction with majorization is developed to maximize the regularized likelihood function. The third method, motivated by the motif finding problem, aims at sequence pattern discovery. A dictionary model is used to partition a sentence into words, which represent sequence patterns, and single letters. A novel regularization through the Kullback-Leibler divergence is developed for the product-multinomial model for words, which can achieve sparsity in estimating the cell probabilities. This regularization is used to construct a sparse dictionary that contains only a small number of words. A generalized EM algorithm is proposed for parameter estimation and solution path construction.As efficient analysis of large-scale high-dimensional data is critical in many fields of science and engineering, the proposed research is of great current interest. Particularly, the proposed methods are ready for applications to front-edge research areas in genomics and molecular biology, where massive data sets have been continuously generated. To accelerate such applications, free computer packages and self-contained software are being developed for users to analyze their own data. On the other hand, this proposal contains many innovative statistical methodologies that may contribute significantly to statistics and computational sciences. Finally, the proposed research is integrated with educational activities by developing new and improving existing courses at both undergraduate and graduate levels.
在生物学和基因组学中经常出现大量高维数据。正则化和稀疏性是对此类数据建模和提取信息的关键组成部分。稀疏建模方法的广泛成功,如Lasso,鼓励了这一领域的快速发展。然而,现有的方法大多是在线性模型和广义线性模型的框架下发展起来的。基因组数据中的复杂结构需要在现有方法之外进一步开发。为此,提出了三种新的稀疏建模方法与复杂的模型结构驱动的大规模基因表达数据,蛋白质结合数据和DNA序列数据。第一种方法通过对蛋白质结合和基因表达之间的关系进行建模,在决策树的终端节点上构建线性回归模型。决策树根据预测因子将总体划分为子组。每个子组在响应和预测变量之间都有自己的稀疏线性回归模型。两种类型的正则化,一种是回归系数,另一种是树的大小,用于鼓励稀疏性。第二种方法涉及的紧密集群的基因表达数据的建设,通过惩罚两个紧密集群之间的分组参数的差异和紧密集群和空集群之间。块式坐标下降结合优化的开发,以最大限度地提高正则化的似然函数。第三种方法,动机的模体发现问题,旨在发现序列模式。字典模型用于将句子划分为表示序列模式的单词和单个字母。本文提出了一种新的正则化方法,利用Kullback-Leibler散度对词的乘积多项式模型进行正则化,从而在估计单元概率时实现稀疏性。这种正则化用于构造仅包含少量单词的稀疏字典。提出了一种用于参数估计和求解路径构造的广义EM算法,由于大规模高维数据的有效分析在科学和工程的许多领域中是至关重要的,因此该算法的研究具有重要的现实意义。特别是,所提出的方法是准备应用到基因组学和分子生物学的前沿研究领域,在那里大量的数据集已不断产生。为了加速这种应用,正在开发免费的计算机软件包和独立软件,供用户分析自己的数据。另一方面,该提案载有许多创新的统计方法,可能对统计和计算科学作出重大贡献。最后,建议的研究是通过开发新的和改进现有的课程,在本科和研究生水平的教育活动相结合。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Qing Zhou其他文献

Effects of booster seat sliding on responses and injuries of child occupant
加高座椅滑动对儿童乘员反应和伤害的影响
Mitochondrial dysfunction caused by SIRT3 inhibition drives pro-inflammatory macrophage polarization in obesity
SIRT3 抑制引起的线粒体功能障碍驱动肥胖中促炎巨噬细胞极化
  • DOI:
    10.1002/oby.23707
  • 发表时间:
  • 期刊:
  • 影响因子:
    6.9
  • 作者:
    Qing Zhou;Yuyan Wang;Zongshi Lu;Bowen Wang;Li Li;Mei You;Lijuan Wang;Tingbing Cao;Yu Zhao;Qiang Li;Aidi Mou;Wentao Shu;Hongbo He;Zhigang Zhao;Daoyan Liu;Zhiming Zhu;Peng Gao;Zhencheng Yan
  • 通讯作者:
    Zhencheng Yan
Differential expression of CD300a/c on human TH1 and TH17 cells
CD300a/c在人TH1和TH17细胞上的差异表达
  • DOI:
  • 发表时间:
    2011
  • 期刊:
  • 影响因子:
    3
  • 作者:
    V. R. Simhadri;John L. Mariano;Qing Zhou;K. Debell;F. Borrego
  • 通讯作者:
    F. Borrego
Shape controlled flower-like silicon oxide nanowires and their pH response
形状控制的花状氧化硅纳米线及其 pH 响应
  • DOI:
    10.1016/j.apsusc.2011.01.038
  • 发表时间:
    2011
  • 期刊:
  • 影响因子:
    6.7
  • 作者:
    Qi Shao;R. Que;Mingwang Shao;Qing Zhou;D. Ma;Shuitong Lee
  • 通讯作者:
    Shuitong Lee
HER2 Activation Factors in Arsenite-Exposed Bladder Epithelial Cells
亚砷酸盐暴露的膀胱上皮细胞中的 HER2 激活因子
  • DOI:
    10.1093/toxsci/kfy202
  • 发表时间:
    2018-08
  • 期刊:
  • 影响因子:
    3.8
  • 作者:
    Peiyu Jin;Jieyu Liu;Xiaoyan Wang;Li Yang;Qing Zhou;Xiaoli Lin;Shuhua Xi
  • 通讯作者:
    Shuhua Xi

Qing Zhou的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Qing Zhou', 18)}}的其他基金

CDS&E-MSS: Causal Induction in Sequential Decision Processes
CDS
  • 批准号:
    2305631
  • 财政年份:
    2023
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CDS&E-MSS: Causal learning and inference on complex observational data
CDS
  • 批准号:
    1952929
  • 财政年份:
    2020
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
BIGDATA: F: Learning Big Bayesian Networks
BIGDATA:F:学习大贝叶斯网络
  • 批准号:
    1546098
  • 财政年份:
    2015
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
Monte Carlo methods for complex multimodal distributions with applications in Bayesian inference
复杂多峰分布的蒙特卡罗方法及其在贝叶斯推理中的应用
  • 批准号:
    1308376
  • 财政年份:
    2013
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
Statistical Methods for Integrated Gene Regulation Analyses
综合基因调控分析的统计方法
  • 批准号:
    0805491
  • 财政年份:
    2008
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant

相似国自然基金

基于Sparse-Land模型的SAR图像噪声抑制与分割
  • 批准号:
    60971128
  • 批准年份:
    2009
  • 资助金额:
    30.0 万元
  • 项目类别:
    面上项目

相似海外基金

Network Topology Recovery Method using Sparse Modeling
使用稀疏建模的网络拓扑恢复方法
  • 批准号:
    22KJ3056
  • 财政年份:
    2023
  • 资助金额:
    $ 40万
  • 项目类别:
    Grant-in-Aid for JSPS Fellows
Machine Learning for Signal Analysis and System Modeling: Sparse and Event Driven Strategies
用于信号分析和系统建模的机器学习:稀疏和事件驱动策略
  • 批准号:
    RGPIN-2017-05939
  • 财政年份:
    2021
  • 资助金额:
    $ 40万
  • 项目类别:
    Discovery Grants Program - Individual
Sparse Signal Processing and Modeling of High Dimensional Spatio-Temporal Data
高维时空数据的稀疏信号处理和建模
  • 批准号:
    RGPIN-2017-03840
  • 财政年份:
    2021
  • 资助金额:
    $ 40万
  • 项目类别:
    Discovery Grants Program - Individual
Analysis of bounding flight in birds by dynamic sparse modeling and its application to drones
动态稀疏建模分析鸟类弹跳飞行及其在无人机中的应用
  • 批准号:
    20K21008
  • 财政年份:
    2020
  • 资助金额:
    $ 40万
  • 项目类别:
    Grant-in-Aid for Challenging Research (Exploratory)
Sparse Signal Processing and Modeling of High Dimensional Spatio-Temporal Data
高维时空数据的稀疏信号处理和建模
  • 批准号:
    RGPIN-2017-03840
  • 财政年份:
    2020
  • 资助金额:
    $ 40万
  • 项目类别:
    Discovery Grants Program - Individual
A new signal reconstruction method combining sparse modeling and optimal interpolation approximation theory
稀疏建模与最优插值逼近理论相结合的信号重构新方法
  • 批准号:
    20K04489
  • 财政年份:
    2020
  • 资助金额:
    $ 40万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Machine Learning for Signal Analysis and System Modeling: Sparse and Event Driven Strategies
用于信号分析和系统建模的机器学习:稀疏和事件驱动策略
  • 批准号:
    RGPIN-2017-05939
  • 财政年份:
    2020
  • 资助金额:
    $ 40万
  • 项目类别:
    Discovery Grants Program - Individual
Development of next-generation encoder by sparse modeling using frequency domain analysis of super resolution image reconstruction.
使用超分辨率图像重建的频域分析通过稀疏建模开发下一代编码器。
  • 批准号:
    20K21830
  • 财政年份:
    2020
  • 资助金额:
    $ 40万
  • 项目类别:
    Grant-in-Aid for Challenging Research (Exploratory)
CAREER: Adding to the Future: Thermal Modeling, Sparse Sensing, and Integrated Controls for Precise and Reliable Powder Bed Fusion
职业:为未来添砖加瓦:热建模、稀疏传感和集成控制,实现精确可靠的粉床融合
  • 批准号:
    1953155
  • 财政年份:
    2019
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
Sparse Signal Processing and Modeling of High Dimensional Spatio-Temporal Data
高维时空数据的稀疏信号处理和建模
  • 批准号:
    RGPIN-2017-03840
  • 财政年份:
    2019
  • 资助金额:
    $ 40万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了