CAREER: Scalable methods for discovering multivariate dependencies in high dimensional data.
职业:用于发现高维数据中多元依赖性的可扩展方法。
基本信息
- 批准号:1352656
- 负责人:
- 金额:$ 40万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-07-01 至 2019-03-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This proposal aims to develop principled methods for discovering multivariate dependencies which cater to ultra high dimensional settings. A common theme that unites the proposed methods is scalability and identification of their limitations. A popular approach to identifying sparse inverse covariance matrices is through penalized likelihood methods. We propose a novel approach for solving the penalized Gaussian log-likelihood that is faster than its competitors by many orders of magnitude. The second research component in the proposal investigates the statistical properties of thresholded matrices in finite samples, with a view to obtaining a positive definite covariance estimation method which is highly scalable. The third research aspect of the project investigates quantifying the variability and uncertainty of estimated graphical network models. A methodology that takes advantage of a convex pseudo-likelihood formulation of the graphical model selection problem is introduced. This allows for the development of a highly scalable uncertainty quantification method with theoretical safeguards. The fourth research aspect of the project examines the use of the methodology proposed in the previous three sub-components to an application in the area of climate change, where high dimensional covariance estimation is required. The proposal also has a significant teaching and outreach component which aims to introduce statistics to aspiring young scientists at various stages of their undergraduate and graduate studies.The availability of high-throughput data from various applications, including genomics, environmental sciences and others, has created an urgent need for methodology and tools for analyzing high dimensional data. Extracting and making sense of the many complex relationships and multivariate dependencies in the data and developing principled inferential procedures is one of the major challenges facing statisticians and data scientists. The theoretical and methodological work proposed in this project is motivated by applications and interdisciplinary collaborations in fields as diverse as the earth and environmental sciences, genomics and cancer research, and the social sciences. In genomics for instance, one is often interested to know how various genes are associated, and how these associations differ between an experimental (diseased) and control group. Gene regulatory networks also serve as important tools to study the evolutions of diseases. In the context of the climate change debate, modeling temperature at different points on the globe requires parsimonious modeling of the way in which these variables are related. Modeling correlations also arises naturally in material sciences and engineering where one is interested in seeing how different atomic particles interact when new materials are produced. Hence the proposed project for estimating correlations in very high dimensional settings will have widespread applications, since understanding associations/relationships between many variables is an endeavor that is common to many scientific disciplines. The proposed work, though firmly rooted in the statistical sciences, is very much interdisciplinary, and involves collaborations and partnerships between statisticians/data scientists and biomedical scientists, engineers and earth scientists.
该建议旨在开发原则性的方法,发现多元依赖关系,迎合超高的维度设置。一个共同的主题,团结所提出的方法是可扩展性和识别其局限性。一个流行的方法来识别稀疏逆协方差矩阵是通过惩罚似然方法。我们提出了一种新的方法来解决惩罚高斯对数似然比它的竞争对手快了许多数量级。该提案的第二个研究部分调查有限样本中阈值矩阵的统计特性,以期获得一种可高度扩展的正定协方差估计方法。该项目的第三个研究方面调查量化估计的图形网络模型的可变性和不确定性。一种方法,利用凸伪似然制定的图形模型选择问题的介绍。这允许开发具有理论保障的高度可扩展的不确定性量化方法。该项目的第四个研究方面审查了在气候变化领域的应用程序中使用前三个子组件中提出的方法,其中需要高维协方差估计。该提案也有一个重要的教学和推广的组成部分,其目的是介绍统计数据,以有抱负的年轻科学家在他们的本科和研究生学习的各个阶段,从各种应用,包括基因组学,环境科学和其他高通量数据的可用性,创造了一个迫切需要的方法和工具,用于分析高维数据。提取和理解数据中的许多复杂关系和多变量依赖关系,并开发原则性的推理程序,是统计学家和数据科学家面临的主要挑战之一。本项目中提出的理论和方法学工作是由地球和环境科学,基因组学和癌症研究以及社会科学等不同领域的应用和跨学科合作所推动的。例如,在基因组学中,人们通常有兴趣知道各种基因是如何关联的,以及这些关联在实验组(患病组)和对照组之间有何不同。基因调控网络也是研究疾病演变的重要工具。在气候变化辩论的背景下,对地球仪上不同点的温度进行建模需要对这些变量之间的关系进行简化建模。在材料科学和工程中,建模相关性也很自然地出现,人们有兴趣了解当新材料产生时,不同的原子粒子如何相互作用。 因此,所提出的用于估计非常高维设置中的相关性的项目将具有广泛的应用,因为理解许多变量之间的关联/关系是许多科学学科共同的奋进。拟议的工作,虽然牢牢扎根于统计科学,是非常跨学科的,并涉及统计人员/数据科学家和生物医学科学家,工程师和地球科学家之间的合作和伙伴关系。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Balakanapathy Rajaratnam其他文献
Balakanapathy Rajaratnam的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Balakanapathy Rajaratnam', 18)}}的其他基金
CAREER: Scalable methods for discovering multivariate dependencies in high dimensional data.
职业:用于发现高维数据中多元依赖性的可扩展方法。
- 批准号:
1916787 - 财政年份:2017
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
Collaborative Research: Objective Bayesian Model Selection and Estimation in High Dimensional Statistical Models
合作研究:高维统计模型中的客观贝叶斯模型选择和估计
- 批准号:
1106642 - 财政年份:2011
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CMG Collaborative Research: Efficient high dimensional Bayesian methods for climate field reconstruction
CMG 合作研究:气候场重建的高效高维贝叶斯方法
- 批准号:
1025465 - 财政年份:2010
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
Collaborative Research: P2C2--Multiproxy Reconstructions as A Missing-Data Problem: New Techniques and their Application to Regional Climates of the Past Millennium
合作研究:P2C2——作为缺失数据问题的多代理重建:新技术及其在过去千年区域气候中的应用
- 批准号:
1003823 - 财政年份:2010
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
Exploring and detecting complex multivariate dependencies through sparse graphical models
通过稀疏图形模型探索和检测复杂的多元依赖关系
- 批准号:
0906392 - 财政年份:2009
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
相似国自然基金
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
相似海外基金
Innovating and Validating Scalable Monte Carlo Methods
创新和验证可扩展的蒙特卡罗方法
- 批准号:
DE240101190 - 财政年份:2024
- 资助金额:
$ 40万 - 项目类别:
Discovery Early Career Researcher Award
Creating harmonised and scalable methods and tools for constructing households in large diverse administrative and health research datasets
创建统一且可扩展的方法和工具,用于在大型多样化的行政和健康研究数据集中构建家庭
- 批准号:
ES/X00046X/1 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Research Grant
A scalable cloud-based framework for multi-modal mapping across single neuron omics, morphology and electrophysiology
一个可扩展的基于云的框架,用于跨单个神经元组学、形态学和电生理学的多模式映射
- 批准号:
10725550 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Unified, Scalable, and Reproducible Neurostatistical Software
统一、可扩展且可重复的神经统计软件
- 批准号:
10725500 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
In vivo Perturb-map: scalable genetic screens with single-cell and spatial resolution in intact tissues
体内扰动图:在完整组织中具有单细胞和空间分辨率的可扩展遗传筛选
- 批准号:
10578616 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
A Uniquely Scalable Approach to Sequence Tens of Millions of Single Cells Without Compromising Performance
一种独特的可扩展方法,可在不影响性能的情况下对数千万个单细胞进行测序
- 批准号:
10700398 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Scalable and quantitative chromatin profiling from formalin-fixed paraffin-embedded samples
对福尔马林固定石蜡包埋样品进行可扩展和定量的染色质分析
- 批准号:
10696343 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
A Platform for Scalable Spatial Somatic Variant Profiling
可扩展的空间体细胞变异分析平台
- 批准号:
10662761 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
BRAIN CONNECTS: PatchLink, scalable tools for integrating connectomes, projectomes, and transcriptomes
大脑连接:PatchLink,用于集成连接组、投影组和转录组的可扩展工具
- 批准号:
10665493 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Developing a Scalable FASD-Informed Person-Centered Planning Intervention
制定可扩展的 FASD 知情的以人为中心的规划干预措施
- 批准号:
10644186 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别: