权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

統計的学習に基づく強化学習に関する研究

基于统计学习的强化学习研究

基本信息

批准号：
20700208
负责人：
森健
金额：
$ 1.91万
依托单位：
Kyoto University
依托单位国家：
日本
项目类别：
Grant-in-Aid for Young Scientists (B)
财政年份：
2008
资助国家：
日本
起止时间：
2008 至 2009
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-20700208/
关键词：
強化学習統計的学習

项目摘要

多くの強化学習法では、ある状態である行動を取ることの将来的な良さを表す「価値関数」を近似する必要がある。最も広く行われている方法は、価値関数をパラメータと基底関数の内積で表現する線形関数近似を行う方法である。基底関数は設計者の試行錯誤により得られる。自動的に基底関数を構築する方法もあるが、非常に大きな計算コストが掛かる。我々は、価値関数の近似誤差を逐次的に減少させる近似法を提案しており、本年度は主にこの業績化に取り組んだ。この方法は、設計者の事前の試行錯誤を必要とせず、また、計算コストも小さくて済む。基本的なアルゴリズムを国際会議論文として業績化し、それをロバストに改良したアルゴリズムについても国際会議論文として業績化した。アルゴリズムの性質を理論面および実験面においてより深め学術論文誌へ投稿したがまだ採録に至っていない。アルゴリズム全2体の統計的な性質をクリアにすることで、さらなる業績化が可能と考えている。また、これまでに考案してきた統計的学習に基づく種々の強化学習アルゴリズムを、本科研費で購入した実機ロボットへ適用し学習を試みた。具体的には、レゴマインドストームを用いて二輪型ロボットを作製し、そのバランシングを新たな強化学習法を用いて行った。二輪型ロボットのバランシングを自動調整することは、自転車やバイクにおける個々人の運転の快適性を向上させることに貢献し、さらには事故率の低減にも繋がると考えている。

The reinforcement learning method of many くの, the state of であるaction をtake ることのthe future なgood さを table す「価値夤数」をapproximate するnecessary がある. The most suitable method is the linear closed number approximation method, and the inner product of the base close number is the linear close number approximation method. The basic levels are based on trial and error by the designer. Automatic basic level number construction method and very large calculation method. We will gradually reduce the approximation error of the 䡡夤 relationship and propose a proposal for the approximation method, and the results will be optimized for the current year.このmethod, designer's trial and error beforehand, necessary とせず, また, calculation コストも小さくて済む. Basic なアルゴリズムをInternational Conference PaperとしてPerformanceし、それをロバストに Improvement したアルゴリズムについてもInternational Conference Paper としてPerformance した.アルゴリズムの性をTheoretical surface および実験面においてよりdeep めAcademic paper journal へContribution したがまだCollection に to っていない. The nature of the statistics of all アルゴリズムをクリアにすることで and the performance of さらなる is possible and the test is done.また、これまでにtest caseしてきたstatistical learningにbasedづくkind々のreinforcement learningアルゴリズムを、Undergraduate research funds are not purchased and the machine is applied to study and test. Specifically, the two-wheel type には and レゴマインドストームを are used Make a new reinforcement learning method using a new method. Two-wheeled type ロボットのバランシングをautomatic adjustment wheel, self-propelled bicycle wheel The adaptability of the operation is improved and the accident rate is reduced, and the accident rate is reduced and the system is tested.