权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Learning from Observational Data with Knowledge

职业：从观察数据中学习知识

基本信息

批准号：
1347119
负责人：
Samantha Kleinberg
金额：
$ 52.91万
依托单位：
Stevens Institute of Technology
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2014
资助国家：
美国
起止时间：
2014-05-01 至 2020-04-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1347119&HistoricalAwards=false
关键词：
CAREER Learning Observational Data Knowledge

项目摘要

Large observational datasets from social networks, climatology, finance, and other areas have made it possible for researchers to test complex hypotheses that previous studies would have been under-powered to tackle. This is especially true in biology and health, with the proliferation of new methods for gathering long-term population data, such as from electronic medical records, and real-world health data from body-worn sensors. However, the number of complex hypotheses that can be tested in datasets with hundreds or thousands of variables far surpasses what humans can propose and reason about. Exhaustively testing all possible relationships is not computationally feasible, and after this testing a researcher must still examine a non-trivial number of seemingly significant findings to determine which still need to be validated experimentally. This project aims specifically to infer causal relationships, as these provide insight into not only how a system behaves, but also why it behaves as it does, enabling the development of successful interventions. Results from this work will be incorporated into education at three levels (high school, undergraduate, and graduate) through university courses and summer programs for high school students. In addition to communicating the core concepts of causal inference, the summer programs will also introduce potential computer scientists to key areas of computer science research. Applications of the methods developed to data from stroke and diabetes may lead to new knowledge about the physiologic processes underlying recovery in stroke, and the complex interaction of factors affecting glucose in people with diabetes.This work will lead to more robust and efficient inference of causal relationships from large-scale datasets, through a feedback loop between experiments and prior knowledge. Current approaches require users to specify the set of variables and hypotheses to be tested, but these limit findings to the set a user chose to explore. Instead this work will develop methods that can use prior knowledge in the form of causal relationships as well as prior experimental results to constrain what will be tested and generate new hypotheses. Causes provide information about their effect that are not contained in other variables, so this work will develop measures of how explanatory a cause is and how much information it yields, and use changes in this measure to guide generation of complex relationships in the constrained hypothesis space. The proposed approach differs from stochastic heuristics in that the new method will be deterministic, and will evaluate relationships individually, thus addressing the computational challenge and reducing the impact of incorrect inference. Second, the work will lead to algorithms that can automatically evaluate how findings relate to prior knowledge, whether they are, for example, consistent, novel, or contradictory. This will allow researchers to focus more in depth on findings likely to be significant or interesting, rather than those that simply confirm prior knowledge. It also provides a feedback loop between knowledge and inference.

来自社交网络、气候学、金融和其他领域的大型观测数据使研究人员有可能测试复杂的假设，而以前的研究可能无法解决这些假设。随着收集长期人口数据的新方法的激增，在生物学和健康领域尤其如此，例如从电子医疗记录收集数据，以及从穿戴在身上的传感器收集真实世界的健康数据。然而，可以在包含数百或数千个变量的数据集中测试的复杂假设的数量，远远超过了人类提出和推理的数量。彻底测试所有可能的关系在计算上是不可行的，在测试之后，研究人员仍然必须检查大量看似重要的发现，以确定哪些仍然需要实验验证。这个项目专门旨在推断因果关系，因为这些因果关系不仅提供了对系统如何行为的洞察，而且还提供了为什么它这样做的洞察力，从而使成功的干预措施的开发成为可能。这项工作的成果将通过大学课程和高中生暑期计划纳入三个级别的教育(高中、本科和研究生)。除了交流因果推理的核心概念外，暑期项目还将向潜在的计算机科学家介绍计算机科学研究的关键领域。将开发的方法应用于中风和糖尿病的数据，可能会导致对中风潜在的康复生理过程的新认识，以及糖尿病患者影响血糖的因素的复杂相互作用。这项工作将通过实验和先验知识之间的反馈循环，从大规模数据集中更稳健和更有效地推断因果关系。目前的方法需要用户指定要测试的变量和假设集，但这些方法将结果限制在用户选择探索的集。相反，这项工作将开发一些方法，这些方法可以使用因果关系形式的先验知识以及先前的实验结果来限制将被测试的内容并产生新的假设。原因提供了有关其影响的信息，而这些信息并不包含在其他变量中，因此这项工作将开发一个原因有多大的解释性以及它产生了多少信息的度量，并使用这一度量中的变化来指导在受约束的假设空间中生成复杂的关系。该方法与随机启发式方法的不同之处在于，新方法将是确定性的，并将单独评估关系，从而解决了计算挑战，并减少了错误推理的影响。其次，这项工作将导致算法，可以自动评估发现如何与先验知识相关，例如，它们是一致的、新颖的还是矛盾的。这将使研究人员能够更深入地关注可能具有重大意义或有趣的发现，而不是那些简单地证实先前知识的发现。它还在知识和推理之间提供了一个反馈回路。