权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Reusing Data Efficiently for Iterative and Integrative Inference

有效地重用数据进行迭代和集成推理

基本信息

批准号：
2113342
负责人：
Snigdha Panigrahi
金额：
$ 15万
依托单位：
Regents of the University of Michigan - Ann Arbor
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-08-01 至 2024-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2113342&HistoricalAwards=false
关键词：
Reusing Data Efficiently Iterative Integrative

项目摘要

Drawing knowledge and reproducible results from complex data drives a broad range of scientific disciplines. From a statistical viewpoint, model selection and inference are the two fundamental tasks, the latter often pursued only after models are chosen through data-driven procedures. Naively using the same data for both tasks creates complicated correlations between the selected models and their inferential properties, which inevitably affects the reproducibility of findings from these models. The investigator develops methods for reusing data from selection to compensate for these correlations while not squandering away information from the full data. Finding immediate use in biomedical problems, observational studies in the behavioral sciences, and engineering applications, the methods will aid discoveries even when analyses rely on scarce samples. This research has a broader outreach component in creating opportunities for interdisciplinary engagement, training statisticians, and contributing to a new graduate curriculum.The project is geared towards efficient and reproducible inference through a reuse of data from the model selection steps. Combining ideas from convex optimization, probability theory, and statistical learning, the project seeks solutions for two main thrusts. In the first thrust, the investigator develops methods to integrate fresh samples available at a later point in time with information from selection. This workflow is realized in modern applications such as online streaming of data, which demand iterative inference on the fly. In the second thrust, the investigator explores integrative inference by combining selected models from different batches or splits or sources of data. Aggregating inference from multiple sources through a reuse of samples will have the potential for new discoveries that any single dataset may fail to report due to a lack of power.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

从复杂的数据中提取知识和可重复的结果推动了广泛的科学学科。从统计学的角度来看，模型选择和推理是两项基本任务，后者通常只有在通过数据驱动程序选择模型之后才能进行。在两个任务中使用相同的数据会在所选模型及其推理属性之间产生复杂的相关性，这不可避免地影响了这些模型结果的可重复性。研究者开发了重新使用选择数据的方法，以补偿这些相关性，同时不会浪费来自完整数据的信息。在生物医学问题、行为科学的观察研究和工程应用中找到直接的用途，即使在分析依赖于稀缺样本的情况下，这些方法也将有助于发现。这项研究有一个更广泛的推广组成部分，创造跨学科参与的机会，培训统计学家，并有助于新的研究生课程。该项目是面向有效的和可重复的推理，通过重用的数据，从模型选择的步骤。结合凸优化，概率论和统计学习的思想，该项目寻求两个主要目标的解决方案。在第一个重点中，研究人员开发了将稍后可用的新鲜样本与选择信息整合的方法。这种工作流在现代应用程序中实现，例如在线数据流，这些应用程序需要动态迭代推理。在第二个推力中，研究者通过组合来自不同批次或分裂或数据来源的选定模型来探索综合推理。通过重复使用样本来汇总多个来源的推断将有可能产生任何单一数据集可能因缺乏力量而无法报告的新发现。该奖项反映了NSF的法定使命，并被认为值得通过使用基金会的知识价值和更广泛的影响审查标准进行评估来支持。