权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

安定・安全を指向する逆強化学習に基づく運転行動モデリング

基于逆强化学习的驾驶行为建模，以稳定性和安全性为目标

基本信息

批准号：
21H03517
负责人：
下坂正倫
金额：
$ 10.9万
依托单位：
Tokyo Institute of Technology
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (B)
财政年份：
2021
资助国家：
日本
起止时间：
2021-04-01 至 2024-03-31
项目状态：
已结题

项目摘要

近年，先進運転支援システムと呼ばれる，ドライバの運転をアシストする技術の開発が盛んである．それらの技術の発展に，熟練ドライバの運転規範の適切なモデル化と予測技術の開発が望まれている．本研究では，モデル化・予測の枠組の一つとして逆強化学習に注目する．本研究では，運転行動という応用上の特性を踏まえ，「安定性・安全性」に注目した方法論の確立を目指している．逆強化学習は大きく分けて，1) 与えられた報酬場での最適パス生成，2) 教示軌道と1)における最適パス生成との差分に基づく報酬場の更新，から構成される．2)は1)に大きく依存することから，1)の性質が逆強化学習の成否に大きく影響を与えることが分かる．自動車運転行動を対象とした場合，古典的な逆強化学習で議論されてきたような離散的状態空間での大域的に最適なパス生成は難しい．一方，高次元連続状態空間中の局所最適性のパス生成を扱う必要があり，その際のパス生成の安定性の欠如が課題となっている．本研究では，パス生成の枠組として，探索空間全体を確率的・網羅的に探索する枠組を採用することで，パス生成の安定化の達成を試みた．また，従来の研究では議論されてこなかった，2)における1)の結果の利活用の効率化についても注力して手法を開発した．具体的には，1) について，ロボット工学分野でよく使われるRRTパス探索技法を非ホロノミック運動に適したテンプレートベース探索手法を開発した．さらに，2) について，このRRTの結果を活用する重点サンプリング手法を開発し，これに基づく効率的な報酬場更新アルゴリズムを構築した．車線変更タスク，交差点での右左折タスクに関してパス生成および報酬場復元それぞれについて性能を評価し，提案した枠組の有効性を検証した．

In recent years, the development of advanced operation support technology has been booming. In the development of these technologies, the development of prediction technologies is expected to be carried out in accordance with the appropriate specifications for the operation of skilled equipment. This study focuses on the problem of inverse reinforcement learning. This study focuses on the establishment of methodology for stability and safety. Inverse reinforcement learning is divided into two parts: 1) optimal generation of compensation field, 2) teaching orbit, 1) optimal generation of compensation field, difference of compensation field, and updating of compensation field. 2) inverse reinforcement learning is divided into two parts: 1) large dependence, 1) large influence of inverse reinforcement learning. In case of automatic vehicle movement, classical inverse reinforcement learning is difficult to generate optimal solution in discrete state space and large domain. In a square, the optimal state of a high-dimensional continuum in the state space is generated by a necessary problem. This study is aimed at exploring the possibility of establishing a stable system of spatial integration. 2) The efficiency of the results of the study and the development of the methods of study. Specific, 1), the technical division, the exploration of RRT techniques, the development of non-sports exploration techniques. In addition, 2) in the middle of the process, the results of the RRT are used to develop key service delivery methods, and the basic compensation field is updated and constructed. The car line changes the position, the intersection point changes the position, the right side changes the position, the intersection point changes the position.