权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

A study of human perceptual-motor learning process using reward estimation in inverse reinforcement learning

在逆强化学习中使用奖励估计研究人类感知运动学习过程

基本信息

批准号：
20K12576
负责人：
薬師神玲子
金额：
$ 2.75万
依托单位：
Aoyama Gakuin University
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (C)
财政年份：
2020
资助国家：
日本
起止时间：
2020-04-01 至 2024-03-31
项目状态：
已结题

项目摘要

日常生活や仕事、スポーツ等で必要な様々な技能を獲得する過程の解明および学習を促進する手段の開発は古くから心理学の一大テーマであり、近年でも潜在学習や状況的学習の概念の下で様々な研究が行われている。この種の学習はその過程を言語化することが難しいため、内的な学習プロセスを検討するにはパフォーマンスの量的測定結果からの推定法が重要となる。この研究では、機械学習およびロボット制御の分野で開発された「逆強化学習」という計算モデルを利用して、学習者がその学習過程で実際に活用した報酬関数を時系列を追って推定し、この報酬関数の変化と、学習の過程で与えられた顕在的知識(アドバイス) や個人の知識の顕在化(テスト)との関係を検討する。これによって、潜在・顕在過程のインターラクションを含んだ知覚―運動学習の量的モデル化に繋げられると期待できる。逆強化学習は、パフォーマンス（参加者が押したキーの系列　等）から、参加者が用いた報酬を推定しようとするものである。現段階までのところ、研究代表者がこれまでの知覚―運動学習の研究で用いてきた知覚マッチング課題をベースとして、報酬の推定を行うための逆強化学習計算モデルの導出を行い、過去の実験における人間のパフォーマンス記録（キー押し系列の記録）に基づく解析を行った。また、解析の精度を高めるために、より詳細に参加者の行動変容を捉えられる自由度の高い入力装置（トラックパッド）用いた、より自然かつ単純な知覚ー運動マッチング状況を模した課題（軌跡学習課題）の作成をした。今後、この課題を用いて、新たな実験データの取得と報酬関数の推定を行う計画である。

Daily life, career, etc., the process of acquiring skills, understanding and promoting learning, the development of ancient psychology, and the development of potential learning and learning concepts in recent years. The process of this kind of learning is very difficult, and the method of estimation is very important. This study discusses the relationship between machine learning, control and development of inverse reinforcement learning, calculation and utilization, learner learning process, compensation and time series, compensation and transformation, learning process and existing knowledge and individual knowledge. This is the first time that we've seen a change in the way we look forward to it. Reverse reinforcement learning is the process of determining the participant's salary. The current stage of the study, the representative of the study, the use of knowledge, the topic of evaluation, the estimation of compensation, the derivation of inverse reinforcement learning, the past of the human record (the record of the series), the analysis of the basis The accuracy of analysis is high, the movement of participants is high, and the problem (trajectory learning problem) is created. In the future, this issue will be discussed in detail, and new projects will be implemented to obtain and estimate compensation.