权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Robust and scalable Bayesian inference under model misspecification

模型错误指定下的稳健且可扩展的贝叶斯推理

基本信息

批准号：
2435792
负责人：
金额：
--
依托单位：
University of Warwick
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2020
资助国家：
英国
起止时间：
2020 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2435792
关键词：
Robust scalable Bayesian inference under

项目摘要

Inspired by recent advances in model misspecification, generalised Bayesian inference and approximate inference, I propose to advance the state of the art in computational statistics and probabilistic machine learning and introduce a methodological framework, algorithms, and associated computational and statistical theory, for performing robust inference over a universe of potentially misspecified models and their model parameters in large-scale settings. My research is motivated by impactful real-world applications of statistical machine learning in medical and social sciences. Recent research focusing on model misspecification has been looking at how can we be robust to misspecified likelihoods, misspecified prior beliefs and data outliers, given a chosen model family, and some measure of fit or generalisation. This is primarily a post-hoc perspective that attempts to protect and robustify algorithms and inference against natural sources of misspecification. The project attempts to go beyond post-hoc corrections and formally embed into our algorithms and mathematical thinking any available information about the true data generating process. For example, the project has looked at a robust inference method applicable to simulator-based models. Inference with such models is challenging as sampling is possible however the likelihood function is unavailable. Furthermore, simulator-based models often describe some complicated physical or biological phenomena and hence can be easily misspecified in practice. That is, they attempt to provide a rough approximation of a real-world phenomenon however the degree to which this approximation deviates from the true data-generating mechanism can lead to misleading inference outcomes. Model misspecification was recently examined in a series of papers suggesting the Bayesian Nonparametric Learning (NPL) framework which is based on the idea that uncertainty should be imposed directly on the data-generating mechanism rather than a parameter of interest, which is usually the case in traditional Bayesian methodology. The project's proposed method combines this framework with Maximum Mean Discrepancy estimators to provide a robust method suitable for likelihood-free inference. Such an approach provides a novel and computationally efficient method with theoretical guarantees. A different type of model misspecification, widely studied in statistical inference, is measurement error in one of the independent variables. This problem, also called, errors-in-variables or input uncertainty problem arises often in economics, medical and natural sciences in which it is often hard or impossible to measure quantities in the real world exactly. Measurement error in the predictor variable can lead to biased parameter estimates and potentially misleading inference outcomes. For example, in many causal inference problems, the aim is to estimate the causal effect between two random variables, hence a biased estimate leads to false estimates of how the predictor variable affects the outcome variable. Such causal effect estimation problems are met in health and epidemiology sciences where scientists are interested in exposure-outcome relations. This project aims to explore this direction by adapting the NPL framework for Berkson and classical measurement error settings and empirically validating the proposed methodology to real-world applications in health and nutritional sciences.Finally, the project explores model misspecification in the context of Distributionally Robust Optimisation (DRO). The goal is to make robust decisions with respect to a variable while accounting for likelihood misspecification through a worst-case analysis. The use of an NPL posterior in place of a standard Bayesian posterior in this setting can potentially result in less conservative decisions in comparison to traditional DRO methods while also accounting for model misspecification.

受模型错配、广义贝叶斯推理和近似推理的最新进展的启发，我建议在计算统计学和概率机器学习方面推进最新技术，并引入一种方法框架、算法和相关的计算和统计理论，以便在大规模设置中对可能错配的模型及其模型参数进行稳健推理。我的研究动机是统计机器学习在医学和社会科学中的实际应用。最近关于模型错误说明的研究一直在关注我们如何在给定一个选定的模型族和一些拟合或泛化措施的情况下，对错误指定的可能性、错误指定的先验信念和数据异常值保持稳健。这主要是一个事后的观点，试图保护和鲁棒算法和推理对自然来源的错误说明。该项目试图超越事后修正，并正式嵌入到我们的算法和数学思维中，任何有关真实数据生成过程的可用信息。例如，该项目研究了适用于基于模拟器的模型的健壮推理方法。这种模型的推理是具有挑战性的，因为采样是可能的，但似然函数不可用。此外，基于模拟器的模型经常描述一些复杂的物理或生物现象，因此在实践中很容易被错误指定。也就是说，他们试图提供现实世界现象的粗略近似值，但是这种近似值偏离真实数据生成机制的程度可能导致误导性的推断结果。最近，一系列提出贝叶斯非参数学习（NPL）框架的论文检查了模型的错误规范，该框架基于不确定性应直接施加于数据生成机制而不是感兴趣的参数的想法，这通常是传统贝叶斯方法的情况。该项目提出的方法将该框架与最大平均差异估计器相结合，提供了一种适合于无似然推断的鲁棒方法。这种方法提供了一种新颖的、计算效率高的、有理论保证的方法。在统计推断中广泛研究的另一种类型的模型错误是其中一个自变量的测量误差。这个问题，也被称为变量误差或输入不确定性问题，经常出现在经济学、医学和自然科学中，在这些科学中，通常很难或不可能准确地测量现实世界中的数量。预测变量的测量误差可能导致有偏差的参数估计和潜在的误导性推断结果。例如，在许多因果推理问题中，目的是估计两个随机变量之间的因果关系，因此有偏差的估计会导致对预测变量如何影响结果变量的错误估计。这种因果效应估计问题在健康和流行病学科学中遇到，科学家对暴露-结果关系感兴趣。本项目旨在通过调整国家物理实验室的Berkson框架和经典测量误差设置，并在健康和营养科学的实际应用中实证验证所提出的方法，来探索这一方向。最后，该项目探讨了分布鲁棒优化（DRO）背景下的模型错误规范。目标是针对变量做出可靠的决策，同时通过最坏情况分析来考虑错误规范的可能性。在这种情况下，与传统的DRO方法相比，使用NPL后验代替标准贝叶斯后验可能会导致更少的保守决策，同时也会导致模型规格错误。