权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

確率的多段決定過程における期待効用と危険度の最適化に関する研究

随机多阶段决策过程中期望效用与风险优化研究

基本信息

批准号：
07640307
负责人：
門田良信
金额：
$ 0.83万
依托单位：
Wakayama University
依托单位国家：
日本
项目类别：
Grant-in-Aid for General Scientific Research (C)
财政年份：
1995
资助国家：
日本
起止时间：
1995 至无数据
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/en/grant/KAKENHI-PROJECT-07640307/
关键词：
Markov decision process utility function utility optimal policy utility deviation risk premium OLA-stopping time

项目摘要

マルコフ決定過程(S,A,qij(a),r(i,a))と効用関数gについて,present value βの期待効用に関する最適政策の存在と最適方程式に関する研究が,昨年までに行われていた.本年度は次の結果を得た.1.状態空間Sは可算集合,gはBorel可測,各時点t=0,1,2,・・・で停止すれば利得r(i)を得,続行すればcだけ失うとする.時刻t+1以後において最適に停止する期待効用よりも時刻tで停止した方が良い状態の集合を定義してS^*_t{g}で表し,σ^*を,初めてS^*_t{g}に入った時点で停止するstopping timeとする.(1)σ^*がg-最適stopping timeになるための十分条件を得た.(2)時刻t+1において停止する期待効用よりも時刻tで停止した方が良い状態の集合を,S_t{g}で表し,σを,初めてS_t{g}に入った時点で停止する(OLA)stopping timeとする.{S_t{g}}が状態推移に関してclosedならば,σ=σ^*となる.2.状態空間は有限,効用関数gは狭義単調増加とする.risk premium ρ^π_i(β)=E^π_i(β)-g^<-1>(E^π_i(g(β)))に対して,マルコフ決定過程のutility deviationを,present value β,政策π,初期状態iについて,k^π_i=E^π_i{g(β)}-g(E^π_i(β))と定義する.政策に対応した分布の集合上のoperator Tをうまく定義すると,k^π=(k^π_i;i∈S)に関するベクトル方程式k^π=g_i+Σ_<j∈S>qij(π)k^π(T^i(F^π)j)を得る.risk premiumの再帰式は大変複雑であるが,これを使ってrisk premiumが研究できる.例えば,期待利得最大の下でのrisk premium最小化問題については,Σ_<j∈S>qij(a)<1のときに最適政策が存在するための十分条件と,最適方程式が見つかった.

Present value β expects to use the most efficient policy to determine the existence of the most accurate equation. Last year, the most accurate equation was used last year. The results of this year's annual results are very low. 1. The status space can be counted as a collection, g Borel can be counted, and at each point of time, there is a profit r (I) gain. In the future, please do not expect to use this time to stop the good status collection to define the table S^ * _ t {g}, σ ^. The initial date S^ * _ t {g} has entered the timeframe to stop the stopping time failure. (1) σ ^ * g-the most expensive stopping time condition is satisfactory. (2) it is expected to stop the collection of good party status at the last minute, please do not use this time to stop the customer status collection, set the table, σ. In the beginning, S _ t {g} entered the OLA stopping time at the same time. {S _ t {g}} the state has changed, σ = σ ^ * normal. 2. The state space is limited. Risk premium ρ ^ π _ I (β) = E ^ π _ I (β)-g ^ & lt;-1> (E ^ π _ I (g (β) is used to determine the process of utility deviation, present value β, policy π, initial status, k ^ π _ I = E ^ π _ I {g (β)}-g (E ^ π _ I (β)). The equation k^ π = (k ^ π _ I; I ∈ S) is defined on the operator T distribution set of policy statistics. The equation k^ π = Globi + Σ _ & lt; j ∈ S & gt;qij (π) k ^ π (T ^ I (F ^ π) j) is obtained. The risk premium formula makes a copy of the risk premium. For example, it is expected to maximize the benefit of the risk premium minimization problem, Σ _ & lt; j ∈ S & gt;qij (a) & lt;1 optimal policy exists under the condition that there is a limit of 10%, and the equation is simple.