Martingale Control of mFDR in Variable Selection
变量选择中 mFDR 的鞅控制
基本信息
- 批准号:1106743
- 负责人:
- 金额:$ 32万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2011
- 资助国家:美国
- 起止时间:2011-07-01 至 2014-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The investigators in this project develop methods that control the selection of predictive features from multiple sources when building statistical models. A martingale representation of the number of spurious variables provides the underlying theoretic support. This martingale defines a framework for testing a possibly infinite sequence of hypotheses. This representation leads to methods for streaming feature selection that control the expected number of false discoveries (mFDR). Extensions to be developed in this project generalize prior work of the investigators, extending their results to multiple streams of potential features while maintaining the martingale representation. Whereas the previous work of the authors was in the high-noise, low-signal setting in which few features are predictive (the nearly black setting), advances in this proposal push their methods into problems characterized by many predictive features with higher signal-to-noise ratios. This proposal envisions replacing the original martingale by one directly related to the goodness of fit of the model. The investigators plan to use this revised martingale to show that an auction-based system that combines several sources of features satisfies the mFDR condition.The investigators develop novel methods for building predictive statistical models that combine and learn from multiple sources of information. A predictive statistical model is an empirical rule constructed from data that predicts a specific characteristic of observations, the response, based on the values of other characteristics. The challenge of building these models is to identify characteristics that yield predictive insights. While ever larger amounts of data are an essential input to a statistical model, the presence of vast numbers of characteristics lead to the problem of over-fitting. Over-fitting occurs when one confuses a random coincidence among characteristics with a reproducible pattern. Modern data mining produces such a plethora of characteristics that it becomes difficult to distinguish real from imaginary associations. The investigators propose a system that makes these distinctions in the context of a common modeling paradigm. As a practical testbed, the investigators will analyze classic computational linguistic problems using regression analysis, the workhorse method of applied statistics. Given the extent of experience in linguistics, any deficiencies of a regression model will stand out. This will encourage innovations in regression that maintain their simplicity while competing with handcrafted methods in linguistics. These innovations should extend to other applications including fMRI, genetics, and more general data mining.
该项目的研究人员开发了在构建统计模型时控制从多个来源选择预测特征的方法。 虚假变量数量的鞅表示提供了基本的理论支持。 这个鞅定义了一个框架,用于测试可能无限的假设序列。 这种表示导致流特征选择的方法,控制预期的错误发现数(mFDR)。 在这个项目中开发的扩展概括了调查人员以前的工作,将他们的结果扩展到多个潜在的功能流,同时保持鞅表示。虽然作者以前的工作是在高噪声,低信号设置中,其中很少有特征是可预测的(接近黑色的设置),但该提案的进步将他们的方法推向了具有许多预测特征的问题,具有更高的信噪比。 该建议设想用与模型拟合优度直接相关的鞅替换原始鞅。 研究人员计划使用这种修正的鞅来证明,一个基于拍卖的系统,结合了几个来源的功能满足mFDR条件。研究人员开发了新的方法来建立预测的统计模型,联合收割机,并从多个信息源学习。 预测统计模型是根据数据构建的经验规则,该规则基于其他特征的值预测观测的特定特征,即响应。 构建这些模型的挑战是识别产生预测性见解的特征。 虽然越来越多的数据是统计模型的重要输入,但大量特征的存在会导致过度拟合的问题。 过度拟合发生在当一个人混淆了随机重合的特征与可重复的模式。 现代数据挖掘产生了过多的特征,以至于很难区分真实的和想象的关联。 研究人员提出了一个系统,使这些区别的上下文中的一个共同的建模范式。作为一个实际的测试平台,研究人员将使用回归分析,应用统计学的主力方法来分析经典的计算语言问题。考虑到语言学的经验范围,回归模型的任何缺陷都会突出。这将鼓励回归的创新,保持其简单性,同时与语言学中的手工方法竞争。这些创新应该扩展到其他应用,包括功能磁共振成像,遗传学和更一般的数据挖掘。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Robert Stine其他文献
Robert Stine的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
Cortical control of internal state in the insular cortex-claustrum region
- 批准号:
- 批准年份:2020
- 资助金额:25 万元
- 项目类别:
相似海外基金
CAREER: Elucidating Biogenic Control of Heterogenous Ice Nucleation
职业:阐明异质冰核的生物控制
- 批准号:
2336558 - 财政年份:2024
- 资助金额:
$ 32万 - 项目类别:
Continuing Grant
CAREER: Resilient and Efficient Automatic Control in Energy Infrastructure: An Expert-Guided Policy Optimization Framework
职业:能源基础设施中的弹性和高效自动控制:专家指导的政策优化框架
- 批准号:
2338559 - 财政年份:2024
- 资助金额:
$ 32万 - 项目类别:
Standard Grant
CAREER: Data-Enabled Neural Multi-Step Predictive Control (DeMuSPc): a Learning-Based Predictive and Adaptive Control Approach for Complex Nonlinear Systems
职业:数据支持的神经多步预测控制(DeMuSPc):一种用于复杂非线性系统的基于学习的预测和自适应控制方法
- 批准号:
2338749 - 财政年份:2024
- 资助金额:
$ 32万 - 项目类别:
Standard Grant
Molecular Control of Thermomechanics and Shape-Morphing of Dynamic Covalent Polymer Networks
热机械的分子控制和动态共价聚合物网络的形状变形
- 批准号:
2406256 - 财政年份:2024
- 资助金额:
$ 32万 - 项目类别:
Standard Grant
CAREER: Facilitating Autonomy of Robots Through Learning-Based Control
职业:通过基于学习的控制促进机器人的自主性
- 批准号:
2422698 - 财政年份:2024
- 资助金额:
$ 32万 - 项目类别:
Continuing Grant
PZT-hydrogel integrated active non-Hermitian complementary acoustic metamaterials with real time modulations through feedback control circuits
PZT-水凝胶集成有源非厄米互补声学超材料,通过反馈控制电路进行实时调制
- 批准号:
2423820 - 财政年份:2024
- 资助金额:
$ 32万 - 项目类别:
Standard Grant
Collaborative Research: How do plants control sperm nuclear migration for successful fertilization?
合作研究:植物如何控制精子核迁移以成功受精?
- 批准号:
2334517 - 财政年份:2024
- 资助金额:
$ 32万 - 项目类别:
Standard Grant
Collaborative Research: NSF-BSF: How cell adhesion molecules control neuronal circuit wiring: Binding affinities, binding availability and sub-cellular localization
合作研究:NSF-BSF:细胞粘附分子如何控制神经元电路布线:结合亲和力、结合可用性和亚细胞定位
- 批准号:
2321481 - 财政年份:2024
- 资助金额:
$ 32万 - 项目类别:
Continuing Grant
Collaborative Research: NSF-BSF: How cell adhesion molecules control neuronal circuit wiring: Binding affinities, binding availability and sub-cellular localization
合作研究:NSF-BSF:细胞粘附分子如何控制神经元电路布线:结合亲和力、结合可用性和亚细胞定位
- 批准号:
2321480 - 财政年份:2024
- 资助金额:
$ 32万 - 项目类别:
Continuing Grant
CAREER: A Universal Framework for Safety-Aware Data-Driven Control and Estimation
职业:安全意识数据驱动控制和估计的通用框架
- 批准号:
2340089 - 财政年份:2024
- 资助金额:
$ 32万 - 项目类别:
Standard Grant














{{item.name}}会员




