权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

BIGDATA: Causal Inference in Large-Scale Time Series with Rare and Latent Events

大数据：具有罕见和潜在事件的大规模时间序列的因果推断

基本信息

批准号：
8852180
负责人：
SAMANTHA KLEINBERG
金额：
$ 20.61万
依托单位：
STEVENS INSTITUTE OF TECHNOLOGY
依托单位国家：
美国
项目类别：
财政年份：
2013
资助国家：
美国
起止时间：
2013-06-01 至 2016-05-31
项目状态：
已结题

项目摘要

DESCRIPTION (provided by applicant): With the increasing availability of large-scale datasets such as from intensive care units (ICUs), researchers face a flood of data that does not lead immediately to knowledge. Given its volume and frequency of collection (ICU patients are monitored every 5 seconds) many important events will be rare occurrences. Unlike the traditional approach of prospectively measuring a small set of variables hypothesized to be important, these observational datasets contain a large, unselected, and incomplete set of features. They can allow insight into cases where experiments are infeasible, but using them for decision-making requires new methods for finding the impact of rare events and hidden variables in complex time sense, along with realistic simulated data for evaluation. This proposal addresses two main challenges of large-scale observational data: 1) evaluating the causal impact of rare events, and 2) identifying latent causes. First, we leverage the volume of data and the connection between type (general) and token (singular) causality to infer a model of how a system normally functions, and then determine whether rare event explain a deviation from usual behavior. The basic approach of company a model and observed instances forms the basis for finding latent variables, where we aim to find how much of a variable's value (or how many of its occurrences) is due to influences outside the dataset and to find shared causes for sets of variables. This is motivated by applications to neurological ICU (NICU) data streams where the volume of continuous recordings of patients' brain activity and physiological signs surpasses clinicians' ability to find complex patterns in real time to use them for treatment. Further, clinicians need to know not just that a patient is having a seizure (a low probability event with a potentially significant impact on outcomes), but whether it is causing harm before they can determine how to treat it. To enable rigorous validation of the algorithms, we develop a new computational platform for generating simulated NICU time series data. The methods will improve understanding of seizures in stroke patients and will be broadly applicable to large-scale high- resolution time series data, enabling discoveries in areas such as computational social science. RELEVANCE (See instructions); The methods developed will improve the translation of data to knowledge to policy by identifying actionable information on causes, enabling better and more rapid decision-making by clinicians. Creating and disseminating realistic simulated data will allow for comparison and validation of methods, facilitating computational advances by researchers in computer science and medicine.

描述(由申请人提供)：随着来自重症监护病房(ICU)的大规模数据集的日益可用，研究人员面临着大量不能立即获得知识的数据。鉴于其收集的数量和频率(ICU患者每5秒监测一次)，许多重要事件将很少发生。与传统的前瞻性测量一小部分假设重要的变量的方法不同，这些观测数据集包含大量、未选择的和不完整的特征集。它们可以洞察实验不可行的情况，但将它们用于决策需要新的方法来寻找复杂时间意义上罕见事件和隐藏变量的影响，以及用于评估的真实模拟数据。这项建议解决了大规模观测数据的两个主要挑战：1)评估罕见事件的因果影响，2)确定潜在原因。首先，我们利用数据量以及类型(一般)和令牌(单一)因果关系之间的联系来推断系统如何正常运行的模型，然后确定罕见事件是否解释了与正常行为的偏差。公司模型和观察实例的基本方法形成了寻找潜在变量的基础，我们的目标是找出一个变量的值有多少(或它的出现次数)是由于数据集之外的影响，并找到变量集的共同原因。这是由神经ICU(NICU)数据流的应用推动的，其中患者大脑活动和生理体征的连续记录的数量超过了临床医生实时发现复杂模式以使用它们进行治疗的能力。此外，临床医生不仅需要知道患者是否有癫痫发作(这是一种可能对预后产生重大影响的小概率事件)，还需要知道它是否正在造成伤害，然后才能确定如何治疗它。为了能够严格验证算法，我们开发了一个新的计算平台来生成模拟的NICU时间序列数据。这些方法将提高对中风患者癫痫发作的理解，并将广泛适用于大规模高分辨率时间序列数据，使在计算社会科学等领域的发现成为可能。相关性(见说明)；开发的方法将通过确定关于原因的可操作信息，改进数据到知识到政策的转化，使临床医生能够更好、更快地做出决策。创建和传播真实的模拟数据将允许比较和验证方法，促进计算机科学和医学研究人员的计算进步。