Distance-based variable selection for high-dimensional biological data
高维生物数据的基于距离的变量选择
基本信息
- 批准号:1313224
- 负责人:
- 金额:$ 15万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2013
- 资助国家:美国
- 起止时间:2013-09-15 至 2016-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The overall objective of the research project is to provide statisticians and biological scientists new and improved statistical tools for measuring variable importance and selecting key variables in high-dimensional biological data. The two major ingredients underlying the research are a distance-based procedure across multiple dimensions and a tilting/weighting-based importance measure for any specific dimension. The distance-based methods, e.g., the multi-response permutation procedure and the distance covariance, provide the ability to handle data even if the number of dimensions is larger than the sample size. The tilting/weighting-based procedures allow variable importance to be evaluated for any dimension in the presence of any number of other variables. Thus, variable importance is evaluated in the multivariate context rather than on univariate marginal distributions. In addition, the new methods will allow the number of selected variables to exceed the sample size; allow forward selection, backward selection, and sparse penalized weighting; minimize perturbation to the dependence structures actually present in the data; require minimal structural assumptions; and be sensitive to a wide range of multivariate dependencies, including some difficult or even impossible to detect with existing methods.The methods developed as part of this project have a wide range of applications in biomedical and agricultural industries. Modern genomics tools allow researchers to simultaneously measure thousands of variables that contain information about DNA, RNA, and protein characteristics of organisms. The high-dimensional data generated by these modern high-throughput technologies must be mined to identify the variables that are most associated with health outcomes or other important traits. Uncovering of such associations is crucial in a variety of areas including drug discovery, genetic risk analysis, personalized medicine, and plant and animal breeding. This research project will provide tools to help make these discoveries possible. Reliable software implementations of the new methods will be created, maintained, archived in public repositories, and freely disseminated to genomics researchers and industry practitioners working with a diverse range of organisms and different high-throughput technologies. The research activity will enhance collaborations and partnerships among researchers from both computational/statistical fields and experimental/biomedical fields.
该研究项目的总体目标是为统计学家和生物科学家提供新的和改进的统计工具,用于测量高维生物数据中的变量重要性和选择关键变量。这项研究的两个主要组成部分是基于距离的跨多个维度的程序和基于倾斜/加权的任何特定维度的重要性度量。基于距离的方法,例如,多响应排列过程和距离协方差,提供了处理数据的能力,即使维度的数量大于样本量。倾斜/加权为基础的程序允许在存在任何数量的其他变量的情况下,对任何维度的变量重要性进行评估。因此,变量重要性是在多变量背景下评估的,而不是在单变量边际分布上评估的。此外,新方法将允许所选变量的数量超过样本量;允许前向选择、后向选择和稀疏惩罚加权;最小化对数据中实际存在的依赖结构的扰动;要求最低限度的结构性假设;并且要对广泛的多变量依赖关系敏感,包括一些用现有方法很难甚至不可能检测到的依赖关系。作为该项目的一部分开发的方法在生物医学和农业工业中具有广泛的应用。现代基因组学工具允许研究人员同时测量包含生物体DNA、RNA和蛋白质特征信息的数千个变量。必须对这些现代高通量技术产生的高维数据进行挖掘,以确定与健康结果或其他重要特征最相关的变量。发现这种关联在药物发现、遗传风险分析、个性化医疗以及动植物育种等各个领域都至关重要。这个研究项目将为帮助这些发现成为可能提供工具。新方法的可靠软件实现将被创建、维护、存档在公共存储库中,并免费分发给基因组学研究人员和从事各种生物和不同高通量技术工作的行业从业者。这项研究活动将加强计算/统计领域和实验/生物医学领域研究人员之间的合作和伙伴关系。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Daniel Nettleton其他文献
Daniel Nettleton的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Daniel Nettleton', 18)}}的其他基金
Conference on Predictive Inference and Its Applications
预测推理及其应用会议
- 批准号:
1810945 - 财政年份:2018
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Joint NSF/ERA-CAPS: Host Targets of Fungal Effectors as Keys to Durable Disease Resistance
NSF/ERA-CAPS 联合:真菌效应子的宿主靶点是持久抗病性的关键
- 批准号:
1339348 - 财政年份:2014
- 资助金额:
$ 15万 - 项目类别:
Continuing Grant
Development of High-Dimensional Data Analysis Methods for the Identification of Differentially Expressed Gene Sets
开发用于鉴定差异表达基因集的高维数据分析方法
- 批准号:
0714978 - 财政年份:2007
- 资助金额:
$ 15万 - 项目类别:
Continuing Grant
相似国自然基金
Data-driven Recommendation System Construction of an Online Medical Platform Based on the Fusion of Information
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:外国青年学者研究基金项目
Incentive and governance schenism study of corporate green washing behavior in China: Based on an integiated view of econfiguration of environmental authority and decoupling logic
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:外国学者研究基金项目
Exploring the Intrinsic Mechanisms of CEO Turnover and Market Reaction: An Explanation Based on Information Asymmetry
- 批准号:W2433169
- 批准年份:2024
- 资助金额:万元
- 项目类别:外国学者研究基金项目
含Re、Ru先进镍基单晶高温合金中TCP相成核—生长机理的原位动态研究
- 批准号:52301178
- 批准年份:2023
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
NbZrTi基多主元合金中化学不均匀性对辐照行为的影响研究
- 批准号:12305290
- 批准年份:2023
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
眼表菌群影响糖尿病患者干眼发生的人群流行病学研究
- 批准号:82371110
- 批准年份:2023
- 资助金额:49.00 万元
- 项目类别:面上项目
镍基UNS N10003合金辐照位错环演化机制及其对力学性能的影响研究
- 批准号:12375280
- 批准年份:2023
- 资助金额:53.00 万元
- 项目类别:面上项目
CuAgSe基热电材料的结构特性与构效关系研究
- 批准号:22375214
- 批准年份:2023
- 资助金额:50.00 万元
- 项目类别:面上项目
A study on prototype flexible multifunctional graphene foam-based sensing grid (柔性多功能石墨烯泡沫传感网格原型研究)
- 批准号:
- 批准年份:2020
- 资助金额:20 万元
- 项目类别:
基于大数据定量研究城市化对中国季节性流感传播的影响及其机理
- 批准号:82003509
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Multi-variable based vegetation monitoring and prediction during droughts
干旱期间基于多变量的植被监测与预测
- 批准号:
FT230100209 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
ARC Future Fellowships
Orthogonalization-based manipulated-variable ranking for identifying and addressing gain-conditioning problems in multivariable control systems
基于正交化的操纵变量排序,用于识别和解决多变量控制系统中的增益调节问题
- 批准号:
567501-2021 - 财政年份:2022
- 资助金额:
$ 15万 - 项目类别:
Alliance Grants
Modeling and machine learning-based control of a continuously-variable transmission system
无级变速器系统的建模和基于机器学习的控制
- 批准号:
570764-2021 - 财政年份:2022
- 资助金额:
$ 15万 - 项目类别:
Alliance Grants
Novel p-Value Based Multiple Testing Methods for Variable Selection with False Discovery Rate Control
基于 p 值的新颖变量选择多重测试方法以及错误发现率控制
- 批准号:
2210687 - 财政年份:2022
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Simulation-based Methods for Large Dynamic Latent Variable Models with Unobserved Heterogeneity
具有不可观测异质性的大动态潜变量模型的基于仿真的方法
- 批准号:
RGPIN-2020-04161 - 财政年份:2022
- 资助金额:
$ 15万 - 项目类别:
Discovery Grants Program - Individual
Modeling and machine learning-based control of a continuously-variable transmission system
无级变速器系统的建模和基于机器学习的控制
- 批准号:
570764-2021 - 财政年份:2021
- 资助金额:
$ 15万 - 项目类别:
Alliance Grants
Orthogonalization-based manipulated-variable ranking for identifying and addressing gain-conditioning problems in multivariable control systems
基于正交化的操纵变量排序,用于识别和解决多变量控制系统中的增益调节问题
- 批准号:
567501-2021 - 财政年份:2021
- 资助金额:
$ 15万 - 项目类别:
Alliance Grants
A novel monoclonal antibody-based anti-NK cell anti-inflammatory strategy for treating autoimmune and checkpoint inhibitor induced myocarditis
一种基于单克隆抗体的新型抗 NK 细胞抗炎策略,用于治疗自身免疫和检查点抑制剂诱导的心肌炎
- 批准号:
10258059 - 财政年份:2021
- 资助金额:
$ 15万 - 项目类别:
High-throughput identification of antibody features for sequence-based epitope prediction
高通量鉴定抗体特征以进行基于序列的表位预测
- 批准号:
10243575 - 财政年份:2021
- 资助金额:
$ 15万 - 项目类别:
Simulation-based Methods for Large Dynamic Latent Variable Models with Unobserved Heterogeneity
具有不可观测异质性的大动态潜变量模型的基于仿真的方法
- 批准号:
RGPIN-2020-04161 - 财政年份:2021
- 资助金额:
$ 15万 - 项目类别:
Discovery Grants Program - Individual