权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

application of scalable safe reinforcement learning to high-risk robotics

可扩展安全强化学习在高风险机器人技术中的应用

基本信息

批准号：
21J15633
负责人：
Zhu Lingwei
金额：
$ 0.96万
依托单位：
Nara Institute of Science and Technology
依托单位国家：
日本
项目类别：
Grant-in-Aid for JSPS Fellows
财政年份：
2021
资助国家：
日本
起止时间：
2021-04-28 至 2023-03-31
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/en/grant/KAKENHI-PROJECT-21J15633/
关键词：
Reinforcement Learning Entropy regularization safety and robustness

项目摘要

As a summary the progress of research has been going well. Important steps towards the achievements of reinforcement learning for high-risk control have been made as planned. As stated in the research plan, the first year focuses on solving the theoretical problems. The results solved the fundamental problem of how to make use of entropy for more robust reinforcement learning framework and subsequent risk-sensitive control. The works attempted to tackle the problem from several different perspectives such as increasing the robustness directly; ensuring learning improvement; and making use of more stable Tsallis entropy.As a result, the following papers have been published/submitted 5 papers to top international conferences: [1] Cautious Actor Critic, Asian Conference on Machine Learning 2021; [2] Geometric Value Iteration - Dynamic Error Aware KL Regularization for Reinforcement Learning, Asian Conference on Machine Learning 2021; [3] q-Munchausen Reinforcement Learning, Uncertainty in Artificial Intelligence 2022 (under review); [4] Enforcing KL Regularization in Maximum Tsallis Entropy Framework via Advantage Learning, Uncertainty in Artificial Intelligence 2022 (under review); [5] Lower Bound Maximizing Monotonic Policy Improvement, Uncertainty in Artificial Intelligence 2022 (under review)

综上所述，研究进展顺利。朝着高风险控制的强化学习的成就迈出了重要的一步。正如研究计划所述，第一年的重点是解决理论问题。研究结果解决了如何利用熵进行更稳健的强化学习框架和后续风险敏感控制的基本问题。本文试图从几个不同的角度来解决问题，如直接增加鲁棒性；确保学习进步；并利用更稳定的萨利斯熵。因此，在国际顶级会议上发表/提交了5篇论文：[1]谨慎的演员评论家，亚洲机器学习会议2021；几何值迭代-用于强化学习的动态误差感知KL正则化，2021年亚洲机器学习会议；q-Munchausen强化学习，人工智能中的不确定性2022（正在审查中）；[10]基于优势学习的最大Tsallis熵框架中的KL正则化，人工智能中的不确定性，2022（正在审查中）；下界最大化单调政策改进，人工智能的不确定性2022（正在审查中）

项目成果

期刊论文数量（2）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Cautious Actor-Critic

DOI：
发表时间：
2021-07
期刊：
ArXiv
影响因子：
0
作者：
Lingwei Zhu;Toshinori Kitamura;Takamitsu Matsubara
通讯作者：
Lingwei Zhu;Toshinori Kitamura;Takamitsu Matsubara

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Zhu Lingwei其他文献

Microplastics removal and characteristics of constructed wetlands WWTPs in rural area of Changsha, China: A different situation from urban WWTPs

长沙农村地区人工湿地污水处理厂微塑料去除及特征：与城市污水处理厂的不同情况

DOI：
发表时间：
2022
期刊：
Science of the Total Environment
影响因子：
9.8
作者：
Long Yuannan;Zhou Zhenyu;Yin Lingshi;Wen Xiaofeng;Xiao Ruihao;Du Li;Zhu Lingwei;Liu Rongxuan;Xu Qianhui;Li Huiling;Nan Ruichuan;Yan Shixiong
通讯作者：
Yan Shixiong

Automated Sleep Staging via Parallel Frequency-Cut Attention

通过并行频率削减注意力自动睡眠分期

DOI：
10.1109/tnsre.2023.3243589
发表时间：
2023
期刊：
IEEE Transactions on Neural Systems and Rehabilitation Engineering
影响因子：
4.9
作者：
Chen Zheng;Yang Ziwei;Zhu Lingwei;Chen Wei;Tamura Toshiyo;Ono Naoaki;Altaf-Ul-Amin Md;Kanaya Shigehiko;Huang Ming
通讯作者：
Huang Ming

Zhu Lingwei的其他文献

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}