权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Scalable Bayesian Inference for Interpretable Time-Series Models

可解释时间序列模型的可扩展贝叶斯推理

基本信息

批准号：
1544628
负责人：
Finale Doshi-Velez
金额：
$ 7.41万
依托单位：
Harvard University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2015
资助国家：
美国
起止时间：
2015-07-01 至 2016-06-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1544628&HistoricalAwards=false
关键词：
Scalable Bayesian Inference Interpretable Time

项目摘要

From healthcare to retail, from governments to education, we are collecting and storing data. These data sources provide unprecedented opportunities: healthcare data stores, originally collected for billing purposes, can be mined to better understand diseases and improve treatments; government data stores, originally collected for reporting purposes, can be mined to improve national welfare and security; retail data stores, originally collected for accounting purposes, can be used to detect complex fraud and improve the customer experience. In particular, these large data stores allow us to understand the patterns of patients, customers, citizens, and students over time. Probabilistic models for time-series analysis can recover patterns such as disease trajectories and purchasing needs. However, using these data to better understand these patterns---generally collected and stored for other purposes---is challenging for several reasons. These data are typically stored in standard relational databases and subject to complex security protections. They are also often biased and incomplete; one can use them to discover interesting patterns but the results must be used with caution. This proposal makes steps toward addressing these core problems. In particular, we propose to create methods for analyzing time-series that can run efficiently on existing data management architectures and provide interpretable results that can be vetted by a domain expert for accuracy. We apply these approaches to understanding disease progression in diabetes and psychiatric diseases through the analysis of electronic health records. From a technical point of view, this work focuses on a particular probabilistic time-series model, the Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM). Inference in this model is typically performed in a numerical computing environment using Markov Chain Monte Carlo (MCMC). The first part of our work involves adapting the MCMC steps so that they can be run efficiently with a traditional database architectures. Specifically, we will develop methods that do not require the data to be removed from the database by splitting the inference into computations that operate directly on the data---to be performed within the database---and computations that operate on derived statistics---to be performed in a numerical computing environment. The second part of our work involves making models learned from sparse, high-dimensional data more interpretable. Here we will leverage the fact that while these data stores are typically high-dimensional (tens of thousands of dimensions), these dimensions are not all independent; in fact, in most real applications there exists knowledge about how these data are structured. We will apply these knowledge bases to create sparse models that are easier for domain experts to interpret. While we focus on our healthcare application for this work, these techniques are relevant to a variety of applications involving time-series with high-dimensional, sparsely sampled data.

从医疗保健到零售，从政府到教育，我们都在收集和存储数据。这些数据源提供了前所未有的机会：最初出于计费目的收集的医疗数据存储可以被挖掘，以更好地了解疾病并改善治疗;最初出于报告目的收集的政府数据存储可以被挖掘，以改善国家福利和安全;最初出于会计目的收集的零售数据存储可以用于检测复杂的欺诈并改善客户体验。特别是，这些大型数据存储使我们能够了解患者，客户，公民和学生随着时间的推移的模式。用于时间序列分析的概率模型可以恢复疾病轨迹和购买需求等模式。然而，使用这些数据来更好地理解这些模式-通常是为了其他目的而收集和存储的-是具有挑战性的，原因有几个。这些数据通常存储在标准的关系数据库中，并受到复杂的安全保护。它们也经常是有偏见和不完整的;人们可以用它们来发现有趣的模式，但结果必须谨慎使用。该提案为解决这些核心问题采取了步骤。特别是，我们建议创建用于分析时间序列的方法，这些方法可以在现有的数据管理架构上有效地运行，并提供可解释的结果，这些结果可以由领域专家进行准确性审查。我们应用这些方法，通过分析电子健康记录来了解糖尿病和精神疾病的疾病进展。从技术的角度来看，这项工作的重点是一个特定的概率时间序列模型，层次狄利克雷过程隐马尔可夫模型（HDP-HMM）。该模型中的推断通常在使用马尔可夫链蒙特卡罗（MCMC）的数值计算环境中执行。我们工作的第一部分涉及调整MCMC步骤，以便它们可以在传统数据库架构中有效运行。具体来说，我们将开发不需要从数据库中删除数据的方法，方法是将推理分解为直接对数据进行操作的计算-在数据库中执行-以及对派生统计数据进行操作的计算-在数值计算环境中执行。我们工作的第二部分涉及使从稀疏高维数据中学习的模型更具可解释性。在这里，我们将利用这样一个事实，即虽然这些数据存储通常是高维的（数万个维度），但这些维度并不都是独立的;事实上，在大多数真实的应用程序中，都存在关于这些数据如何结构化的知识。我们将应用这些知识库来创建稀疏模型，使领域专家更容易解释。虽然我们专注于我们的医疗保健应用这项工作，这些技术是相关的各种应用程序，涉及时间序列与高维，稀疏采样数据。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Finale Doshi-Velez其他文献

How machine-learning recommendations influence clinician treatment selections: the example of antidepressant selection

机器学习推荐如何影响临床医生的治疗选择：以抗抑郁药选择为例

DOI：
10.1038/s41398-021-01224-x
发表时间：
2021-02-04
期刊：
Translational Psychiatry
影响因子：
6.200
作者：
Maia Jacobs;Melanie F. Pradier;Thomas H. McCoy;Roy H. Perlis;Finale Doshi-Velez;Krzysztof Z. Gajos
通讯作者：
Krzysztof Z. Gajos

Ethical and regulatory challenges of large language models in medicine

医学中大型语言模型的伦理和监管挑战

DOI：
10.1016/s2589-7500(24)00061-x
发表时间：
2024-06-01
期刊：
Lancet Digital Health
影响因子：
24.100
作者：
Jasmine Chiat Ling Ong;Shelley Yin-Hsi Chang;Wasswa William;Atul J Butte;Nigam H Shah;Lita Sui Tjien Chew;Nan Liu;Finale Doshi-Velez;Wei Lu;Julian Savulescu;Daniel Shu Wei Ting
通讯作者：
Daniel Shu Wei Ting

Association between prescriber practices and major depression treatment outcomes

DOI：
10.1016/j.xjmad.2024.100080
发表时间：
2024-12-01
期刊：
Research article
影响因子：
作者：
Sarah Rathnam;Abhishek Sharma;Kamber L. Hart;Pilar F. Verhaak;Thomas H. McCoy;Roy H. Perlis;Finale Doshi-Velez
通讯作者：
Finale Doshi-Velez