权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Sequential decision making under uncertainty: fundamental limits and applications

不确定性下的序贯决策：基本限制和应用

基本信息

批准号：
RGPIN-2020-04256
负责人：
Song, Yanglei
金额：
$ 1.31万
依托单位：
Queen's University
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2021
资助国家：
加拿大
起止时间：
2021-01-01 至 2022-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=739767
关键词：
Sequential decision making under uncertainty

项目摘要

Sequential decision making (SDM) is an interactive process between a sequence of actions and observations: at each time, an action is taken by a decision-maker based on past data, which in turn affects the distribution of future observations. The problem is to come up with a strategy of selecting actions to achieve an overall objective, such as reaching a reliable conclusion as fast as possible, maximizing the cumulative rewards, etc. SDM tasks arise in a range of applications from different areas, including signal processing, clinical trials, personalized medicine, intelligent tutoring systems, online advertising, and recommendation systems, for which we aim to understand the fundamental limits and propose matching algorithms that are also computationally efficient. The major challenge is the complex dependence structure among observations induced by adaptive actions, which calls for a different set of tools than those for static data analysis. In this proposal, we investigate three themes in SDM with different formulations and applications. The first theme is on testing multiple hypotheses based on streaming data. In its simplest form, a decision-maker evaluates data as they arrive, and stops the sampling process until the evidence is strong enough for solving all the hypotheses. The goal is to minimize the sampling cost, while controlling error rates in some family-wise sense. We will also study the case where there exist sampling constraints (such as each time only a limited number streams can be observed) and/or different streams can have a separate stopping time. The second theme is on influencing and fast detecting a change-point. Motivated by applications such as online education, for which the goal is to actively help students master skills over time by adaptively administering educational items, we will consider a framework where the aim is to accelerate a hidden change, and then detect it as soon as possible, subject to false alarm constraint. The online procedures usually assume the knowledge of the dynamics, and we will also study the problem of offline model estimation. The third theme is on contextual bandit problem. Consider multiple treatments for a disease, whose efficacy depends on patients' characteristics (context), such as genes. As a new patient arrives, based on the context and past knowledge, the doctor needs to select a treatment, of which the outcome is observed. The goal is to minimize the regret against an oracle, who knows how the outcome depends on context and treatment. We will particularly consider the case where the context is high dimensional. The expected research outcomes will significantly advance the understanding and practice of SDM. We will document research results in top journals and incorporate the methodology into publicly released software such as R. This program will create and integrate educational opportunities for HQP, and support the training of 3 PhD students, 3 MSc students, and 2 USRAs.

顺序决策（Sequential decision making， SDM）是一系列行动和观察结果之间的交互过程：每次决策者根据过去的数据采取行动，进而影响未来观察结果的分布。问题是要想出一个选择行动的策略来实现一个整体目标，比如尽快得出一个可靠的结论，最大化累积奖励等等。SDM任务出现在不同领域的一系列应用中，包括信号处理、临床试验、个性化医疗、智能辅导系统、在线广告和推荐系统，我们的目标是了解基本限制，并提出计算效率高的匹配算法。主要的挑战是由自适应行为引起的观察之间复杂的依赖结构，这需要一套不同于静态数据分析的工具。在本提案中，我们研究了SDM中具有不同配方和应用的三个主题。第一个主题是基于流数据测试多个假设。最简单的形式是，决策者在数据到达时对其进行评估，并停止抽样过程，直到证据足够强大，足以解决所有假设。目标是最小化采样成本，同时在某种意义上控制错误率。我们还将研究存在采样约束的情况（例如每次只能观察到有限数量的流）和/或不同的流可以有单独的停止时间。第二个主题是影响和快速发现一个变化点。受在线教育等应用程序的激励，其目标是通过自适应地管理教育项目，积极地帮助学生掌握技能，我们将考虑一个框架，其目的是加速隐藏的变化，然后尽快发现它，受到虚警约束。在线过程通常假定动力学知识，我们也将研究离线模型估计问题。第三个主题是背景土匪问题。考虑对一种疾病进行多种治疗，其疗效取决于患者的特征（环境），例如基因。当一个新病人到来时，医生需要根据环境和过去的知识选择一种治疗方法，并观察其结果。目标是尽量减少对oracle的遗憾，因为oracle知道结果如何取决于上下文和处理。我们将特别考虑上下文是高维的情况。预期的研究成果将显著促进SDM的认识和实践。我们将在顶级期刊上记录研究成果，并将方法纳入r等公开发布的软件中。该计划将为HQP创造和整合教育机会，并支持3名博士生，3名硕士研究生和2名usra的培训。