Interactive reinforcement learning for adaptive experimental design

用于自适应实验设计的交互式强化学习

基本信息

  • 批准号:
    RGPIN-2020-06933
  • 负责人:
  • 金额:
    $ 1.75万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2020
  • 资助国家:
    加拿大
  • 起止时间:
    2020-01-01 至 2021-12-31
  • 项目状态:
    已结题

项目摘要

Unlike algorithms in supervised and unsupervised learning, which study static collections of existing data, algorithms in reinforcement learning (RL) choose actions to transform their environments and use the effects of those actions to improve their choices incrementally over time. The simplest instance of an RL setting is the bandit problem, in which an agent faces a single choice between a fixed set of actions and receives feedback from its environment within one time-step. Initially motivated by adaptive experimental designs, where one aims to compare several actions by conducting experiments in a maximally informative and efficient way, bandit algorithms have been studied intensively throughout the last decade. Beyond theoretical findings, these algorithms have also been successfully implemented in adaptive experiments, for example for optimizing cancer treatments in mice trials, for tuning microscopy imaging systems, for adjusting hyperparameters, and for designing user interfaces. However, many real-world applications are characterized by dynamics that are not captured by existing RL settings. Even worse, these dynamics sometimes contradict assumptions required by current algorithms. These cases result in algorithms with voided theoretical guarantees, at risk of producing undesirable, potentially dangerous, behaviours. Therefore, leveraging the power of interactive RL algorithms in practice requires approaches intended for these specific conditions. The proposed research program aims to bring RL to the real world of adaptive experimental design. We aim at developing algorithms that perform as expected and with theoretical guarantees that hold under these realistic environments. Through collaborations with researchers in other fields, we will deploy those strategies and measure their impact on real applications. The research program is articulated around the following objectives: 1) Propose flexible interactive RL settings that encompass state-of-the-art adaptive experimental design frameworks; 2) Investigate realistic system dynamics and introduce theoretically grounded algorithms for learning under these constraints; 3) Characterize the arising of undesirable behaviours in learning algorithms and develop strategies to guard against it; 4) Deploy algorithms in real-world applications to showcase their potential and impact the application domain. The proposed research direction has a high-impact potential as it will result in strategies developed for real application cases. Even though the program is focused on the bandit subfield, resulting knowledge and algorithms will constitute a basis for understanding and designing sequential RL algorithms, which still essentially remain confined to simulation environments. Finally, the applications will constitute proofs of concept and will result in deployment guidelines for safe and impactful RL, supporting further research that will bring RL closer to the field.
与研究现有数据静态集合的有监督和无监督学习算法不同,强化学习(RL)中的算法选择行动来改变其环境,并利用这些行动的影响随着时间的推移逐步改进其选择。RL设置最简单的例子是强盗问题,在这个问题中,代理在一组固定的行动中面临一个选择,并在一个时间步内从环境中接收反馈。最初的动机是自适应实验设计,其中一个目标是通过以最大限度地提供信息和有效的方式进行实验来比较几种行为,强盗算法在过去十年中得到了深入研究。除了理论发现之外,这些算法还成功地应用于适应性实验,例如在小鼠试验中优化癌症治疗、调整显微镜成像系统、调整超参数和设计用户界面。然而,许多现实世界的应用程序的特点是现有RL设置无法捕获的动态。更糟糕的是,这些动态有时与当前算法所要求的假设相矛盾。这些情况导致算法具有无效的理论保证,有可能产生不受欢迎的、潜在危险的行为。因此,在实践中利用交互式强化学习算法的力量需要针对这些特定条件的方法。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Durand, Audrey其他文献

An economic evaluation: Simulation of the cost-effectiveness and cost-utility of universal prevention strategies against osteoporosis-related fractures.
  • DOI:
    10.1002/jbmr.1758
  • 发表时间:
    2013-02
  • 期刊:
  • 影响因子:
    6.2
  • 作者:
    Nshimyumukiza, Leon;Durand, Audrey;Gagnon, Mathieu;Douville, Xavier;Morin, Suzanne;Lindsay, Carmen;Duplantie, Julie;Gagne, Christian;Jean, Sonia;Giguere, Yves;Dodin, Sylvie;Rousseau, Francois;Reinharz, Daniel
  • 通讯作者:
    Reinharz, Daniel
Pre-trial cocaine biases choice toward cocaine through suppression of the nondrug option
  • DOI:
    10.1016/j.pbb.2018.07.010
  • 发表时间:
    2018-10-01
  • 期刊:
  • 影响因子:
    3.6
  • 作者:
    Freese, Luana;Durand, Audrey;Ahmed, Serge H.
  • 通讯作者:
    Ahmed, Serge H.
Informing the development of an outcome set and banks of items to measure mobility among individuals with acquired brain injury using natural language processing.
  • DOI:
    10.1186/s12883-022-02938-1
  • 发表时间:
    2022-12-09
  • 期刊:
  • 影响因子:
    2.6
  • 作者:
    Alhasani, Rehab;Godbout, Mathieu;Durand, Audrey;Auger, Claudine;Lamontagne, Anouk;Ahmed, Sara
  • 通讯作者:
    Ahmed, Sara
Cost-effectiveness and accuracy of prenatal Down syndrome screening strategies: should the combined test continue to be widely used?
The Influence of Age, Sex, and Socioeconomic Status on Glycemic Control Among People With Type 1 and Type 2 Diabetes in Canada: Patient-Led Longitudinal Retrospective Cross-sectional Study With Multiple Time Points of Measurement.
  • DOI:
    10.2196/35682
  • 发表时间:
    2023-04-27
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Mousavi, Seyedmostafa;Tannenbaum Greenberg, Dana;Ndjaboue, Ruth;Greiver, Michelle;Drescher, Olivia;Chipenda Dansokho, Selma;Boutin, Denis;Chouinard, Jean-Marc;Dostie, Sylvie;Fenton, Robert;Greenberg, Marley;McGavock, Jonathan;Najam, Adhiyat;Rekik, Monia;Weisz, Tom;Willison, Donald J;Durand, Audrey;Witteman, Holly O
  • 通讯作者:
    Witteman, Holly O

Durand, Audrey的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Durand, Audrey', 18)}}的其他基金

Interactive reinforcement learning for adaptive experimental design
用于自适应实验设计的交互式强化学习
  • 批准号:
    RGPIN-2020-06933
  • 财政年份:
    2022
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Interactive reinforcement learning for adaptive experimental design
用于自适应实验设计的交互式强化学习
  • 批准号:
    RGPIN-2020-06933
  • 财政年份:
    2021
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Interactive reinforcement learning for adaptive experimental design
用于自适应实验设计的交互式强化学习
  • 批准号:
    DGECR-2020-00327
  • 财政年份:
    2020
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Launch Supplement
Self-management of distributed resources in wireless sensor networks
无线传感器网络分布式资源的自我管理
  • 批准号:
    443420-2013
  • 财政年份:
    2014
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Postgraduate Scholarships - Doctoral
Self-management of distributed resources in wireless sensor networks
无线传感器网络分布式资源的自我管理
  • 批准号:
    443420-2013
  • 财政年份:
    2013
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Postgraduate Scholarships - Doctoral

相似国自然基金

海桑属杂种区强化(Reinforcement)的检验与遗传基础研究
  • 批准号:
    30800060
  • 批准年份:
    2008
  • 资助金额:
    23.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

III: Small: Deep Interactive Reinforcement Learning for Self-optimizing Feature Selection
III:小:用于自优化特征选择的深度交互式强化学习
  • 批准号:
    2152030
  • 财政年份:
    2022
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Standard Grant
Interactive reinforcement learning for adaptive experimental design
用于自适应实验设计的交互式强化学习
  • 批准号:
    RGPIN-2020-06933
  • 财政年份:
    2022
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Interactive development of reinforcement learning and adaptive memory
强化学习与适应性记忆的交互发展
  • 批准号:
    10618984
  • 财政年份:
    2021
  • 资助金额:
    $ 1.75万
  • 项目类别:
Interactive reinforcement learning for adaptive experimental design
用于自适应实验设计的交互式强化学习
  • 批准号:
    RGPIN-2020-06933
  • 财政年份:
    2021
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Interactive development of reinforcement learning and adaptive memory
强化学习与适应性记忆的交互发展
  • 批准号:
    10426161
  • 财政年份:
    2021
  • 资助金额:
    $ 1.75万
  • 项目类别:
Interactive development of reinforcement learning and adaptive memory
强化学习与适应性记忆的交互发展
  • 批准号:
    10200405
  • 财政年份:
    2021
  • 资助金额:
    $ 1.75万
  • 项目类别:
Interactive reinforcement learning for adaptive experimental design
用于自适应实验设计的交互式强化学习
  • 批准号:
    DGECR-2020-00327
  • 财政年份:
    2020
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Launch Supplement
Towards interactive explanatory reinforcement learning for aligned and trustworthy agents
为一致且值得信赖的代理提供交互式解释性强化学习
  • 批准号:
    2314554
  • 财政年份:
    2019
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Studentship
Curiosity-driven reinforcement learning algorithms for large scale interactive sculpture systems
用于大型交互式雕塑系统的好奇心驱动的强化学习算法
  • 批准号:
    451938-2013
  • 财政年份:
    2015
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Industrial Postgraduate Scholarships
Curiosity-driven reinforcement learning algorithms for large scale interactive sculpture systems
用于大型交互式雕塑系统的好奇心驱动的强化学习算法
  • 批准号:
    451938-2013
  • 财政年份:
    2014
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Industrial Postgraduate Scholarships
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了