Interactive reinforcement learning for adaptive experimental design
用于自适应实验设计的交互式强化学习
基本信息
- 批准号:RGPIN-2020-06933
- 负责人:
- 金额:$ 1.75万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2022
- 资助国家:加拿大
- 起止时间:2022-01-01 至 2023-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Unlike algorithms in supervised and unsupervised learning, which study static collections of existing data, algorithms in reinforcement learning (RL) choose actions to transform their environments and use the effects of those actions to improve their choices incrementally over time. The simplest instance of an RL setting is the bandit problem, in which an agent faces a single choice between a fixed set of actions and receives feedback from its environment within one time-step. Initially motivated by adaptive experimental designs, where one aims to compare several actions by conducting experiments in a maximally informative and efficient way, bandit algorithms have been studied intensively throughout the last decade. Beyond theoretical findings, these algorithms have also been successfully implemented in adaptive experiments, for example for optimizing cancer treatments in mice trials, for tuning microscopy imaging systems, for adjusting hyperparameters, and for designing user interfaces. However, many real-world applications are characterized by dynamics that are not captured by existing RL settings. Even worse, these dynamics sometimes contradict assumptions required by current algorithms. These cases result in algorithms with voided theoretical guarantees, at risk of producing undesirable, potentially dangerous, behaviours. Therefore, leveraging the power of interactive RL algorithms in practice requires approaches intended for these specific conditions. The proposed research program aims to bring RL to the real world of adaptive experimental design. We aim at developing algorithms that perform as expected and with theoretical guarantees that hold under these realistic environments. Through collaborations with researchers in other fields, we will deploy those strategies and measure their impact on real applications. The research program is articulated around the following objectives: 1) Propose flexible interactive RL settings that encompass state-of-the-art adaptive experimental design frameworks; 2) Investigate realistic system dynamics and introduce theoretically grounded algorithms for learning under these constraints; 3) Characterize the arising of undesirable behaviours in learning algorithms and develop strategies to guard against it; 4) Deploy algorithms in real-world applications to showcase their potential and impact the application domain. The proposed research direction has a high-impact potential as it will result in strategies developed for real application cases. Even though the program is focused on the bandit subfield, resulting knowledge and algorithms will constitute a basis for understanding and designing sequential RL algorithms, which still essentially remain confined to simulation environments. Finally, the applications will constitute proofs of concept and will result in deployment guidelines for safe and impactful RL, supporting further research that will bring RL closer to the field.
与监督和非监督学习中的算法不同,强化学习中的算法研究现有数据的静态集合,强化学习中的算法选择动作来改变它们的环境,并使用这些动作的效果来随着时间的推移逐步改进它们的选择。RL设置的最简单实例是强盗问题,在该问题中,代理面临着在一组固定的操作之间进行单一选择,并在一个时间步内从其环境中接收反馈。最初的动机是适应性实验设计,目的是通过以最大限度地提供信息和有效的方式进行实验来比较几个动作,在过去的十年中,强盗算法得到了深入的研究。除了理论上的发现,这些算法还被成功地应用于适应性实验,例如,优化小鼠试验中的癌症治疗,调整显微成像系统,调整超参数,以及设计用户界面。然而,许多真实世界的应用程序具有现有RL设置无法捕获的动态特性。更糟糕的是,这些动态有时与当前算法所要求的假设相矛盾。这些情况导致算法的理论保证无效,有可能产生不受欢迎的、潜在危险的行为。因此,在实践中利用交互式RL算法的能力需要针对这些特定条件的方法。提出的研究计划旨在将RL带入适应性实验设计的现实世界。我们的目标是开发在这些现实环境下具有预期性能并具有理论保证的算法。通过与其他领域的研究人员合作,我们将部署这些战略,并衡量它们对实际应用程序的影响。该研究计划围绕以下目标展开:1)提出包含最先进的自适应实验设计框架的灵活的交互式RL设置;2)研究现实的系统动力学,并引入在这些约束下的理论基础的学习算法;3)描述学习算法中不良行为的出现并制定预防策略;4)在现实世界的应用中部署算法以展示其潜力并影响应用领域。拟议的研究方向具有很大的影响潜力,因为它将导致为实际应用案例制定战略。即使该程序专注于强盗子领域,所产生的知识和算法将构成理解和设计连续RL算法的基础,这些算法基本上仍然局限于模拟环境。最后,这些应用将构成概念的证明,并将导致安全和有效的RL的部署指导方针,支持进一步的研究,使RL更接近现场。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Durand, Audrey其他文献
An economic evaluation: Simulation of the cost-effectiveness and cost-utility of universal prevention strategies against osteoporosis-related fractures.
- DOI:
10.1002/jbmr.1758 - 发表时间:
2013-02 - 期刊:
- 影响因子:6.2
- 作者:
Nshimyumukiza, Leon;Durand, Audrey;Gagnon, Mathieu;Douville, Xavier;Morin, Suzanne;Lindsay, Carmen;Duplantie, Julie;Gagne, Christian;Jean, Sonia;Giguere, Yves;Dodin, Sylvie;Rousseau, Francois;Reinharz, Daniel - 通讯作者:
Reinharz, Daniel
Pre-trial cocaine biases choice toward cocaine through suppression of the nondrug option
- DOI:
10.1016/j.pbb.2018.07.010 - 发表时间:
2018-10-01 - 期刊:
- 影响因子:3.6
- 作者:
Freese, Luana;Durand, Audrey;Ahmed, Serge H. - 通讯作者:
Ahmed, Serge H.
Informing the development of an outcome set and banks of items to measure mobility among individuals with acquired brain injury using natural language processing.
- DOI:
10.1186/s12883-022-02938-1 - 发表时间:
2022-12-09 - 期刊:
- 影响因子:2.6
- 作者:
Alhasani, Rehab;Godbout, Mathieu;Durand, Audrey;Auger, Claudine;Lamontagne, Anouk;Ahmed, Sara - 通讯作者:
Ahmed, Sara
Cost-effectiveness and accuracy of prenatal Down syndrome screening strategies: should the combined test continue to be widely used?
- DOI:
10.1016/j.ajog.2010.09.017 - 发表时间:
2011-02-01 - 期刊:
- 影响因子:9.8
- 作者:
Gekas, Jean;Durand, Audrey;Reinharz, Daniel - 通讯作者:
Reinharz, Daniel
The Influence of Age, Sex, and Socioeconomic Status on Glycemic Control Among People With Type 1 and Type 2 Diabetes in Canada: Patient-Led Longitudinal Retrospective Cross-sectional Study With Multiple Time Points of Measurement.
- DOI:
10.2196/35682 - 发表时间:
2023-04-27 - 期刊:
- 影响因子:0
- 作者:
Mousavi, Seyedmostafa;Tannenbaum Greenberg, Dana;Ndjaboue, Ruth;Greiver, Michelle;Drescher, Olivia;Chipenda Dansokho, Selma;Boutin, Denis;Chouinard, Jean-Marc;Dostie, Sylvie;Fenton, Robert;Greenberg, Marley;McGavock, Jonathan;Najam, Adhiyat;Rekik, Monia;Weisz, Tom;Willison, Donald J;Durand, Audrey;Witteman, Holly O - 通讯作者:
Witteman, Holly O
Durand, Audrey的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Durand, Audrey', 18)}}的其他基金
Interactive reinforcement learning for adaptive experimental design
用于自适应实验设计的交互式强化学习
- 批准号:
RGPIN-2020-06933 - 财政年份:2021
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Interactive reinforcement learning for adaptive experimental design
用于自适应实验设计的交互式强化学习
- 批准号:
RGPIN-2020-06933 - 财政年份:2020
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Interactive reinforcement learning for adaptive experimental design
用于自适应实验设计的交互式强化学习
- 批准号:
DGECR-2020-00327 - 财政年份:2020
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Launch Supplement
Self-management of distributed resources in wireless sensor networks
无线传感器网络分布式资源的自我管理
- 批准号:
443420-2013 - 财政年份:2014
- 资助金额:
$ 1.75万 - 项目类别:
Postgraduate Scholarships - Doctoral
Self-management of distributed resources in wireless sensor networks
无线传感器网络分布式资源的自我管理
- 批准号:
443420-2013 - 财政年份:2013
- 资助金额:
$ 1.75万 - 项目类别:
Postgraduate Scholarships - Doctoral
相似国自然基金
海桑属杂种区强化(Reinforcement)的检验与遗传基础研究
- 批准号:30800060
- 批准年份:2008
- 资助金额:23.0 万元
- 项目类别:青年科学基金项目
相似海外基金
III: Small: Deep Interactive Reinforcement Learning for Self-optimizing Feature Selection
III:小:用于自优化特征选择的深度交互式强化学习
- 批准号:
2152030 - 财政年份:2022
- 资助金额:
$ 1.75万 - 项目类别:
Standard Grant
Interactive development of reinforcement learning and adaptive memory
强化学习与适应性记忆的交互发展
- 批准号:
10618984 - 财政年份:2021
- 资助金额:
$ 1.75万 - 项目类别:
Interactive reinforcement learning for adaptive experimental design
用于自适应实验设计的交互式强化学习
- 批准号:
RGPIN-2020-06933 - 财政年份:2021
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Interactive development of reinforcement learning and adaptive memory
强化学习与适应性记忆的交互发展
- 批准号:
10426161 - 财政年份:2021
- 资助金额:
$ 1.75万 - 项目类别:
Interactive development of reinforcement learning and adaptive memory
强化学习与适应性记忆的交互发展
- 批准号:
10200405 - 财政年份:2021
- 资助金额:
$ 1.75万 - 项目类别:
Interactive reinforcement learning for adaptive experimental design
用于自适应实验设计的交互式强化学习
- 批准号:
RGPIN-2020-06933 - 财政年份:2020
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Interactive reinforcement learning for adaptive experimental design
用于自适应实验设计的交互式强化学习
- 批准号:
DGECR-2020-00327 - 财政年份:2020
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Launch Supplement
Towards interactive explanatory reinforcement learning for aligned and trustworthy agents
为一致且值得信赖的代理提供交互式解释性强化学习
- 批准号:
2314554 - 财政年份:2019
- 资助金额:
$ 1.75万 - 项目类别:
Studentship
Curiosity-driven reinforcement learning algorithms for large scale interactive sculpture systems
用于大型交互式雕塑系统的好奇心驱动的强化学习算法
- 批准号:
451938-2013 - 财政年份:2015
- 资助金额:
$ 1.75万 - 项目类别:
Industrial Postgraduate Scholarships
Curiosity-driven reinforcement learning algorithms for large scale interactive sculpture systems
用于大型交互式雕塑系统的好奇心驱动的强化学习算法
- 批准号:
451938-2013 - 财政年份:2014
- 资助金额:
$ 1.75万 - 项目类别:
Industrial Postgraduate Scholarships