Based on 2008 AAAI: Probabilistic Planning via Determination in Hindsight
## MDP formulation\(M = \{S, s_0, A, P, R\} \), with - \(S\): finite set of states - \(s_0\): initial state \( s0 \in S \) - \(A\): finite set of actions - \(P\): transition function \( P(s′|s, a) \) giving the probability of reaching \(s′\) by executing \(a\) in state \(s\) - \(r\): reward function \(r(s' | s, a) \), or \(r(s)\) if reward is only state dependent. ## MDP determinizationA T-horizon future \(F : A \times S \times \{1, …, T\} \to [0, 1]\) Given \(M\) and \(F(a, s, t)\), \(M[F]\) denotes a determination of \(M\). This can be done by partitioning the range \([0, 1]\) for each possible outcome of action a based on its probability, and select the determined outcome by finding the corresponding partition that \(F(a, s, t)\) maps to. ## Value Function & Q-Value FunctionValue function maps a given state to an optimal value for a T-horizon MDP problem: $$V^*(s, T) = \underset{\pi}{\operatorname{max}} E[R(s, F, \pi)]$$ Q-Value function maps a given (state, action) pair to an optimal value for a T-horizon MDP problem: $$Q^*(s, a, T) = r(s) + \underset{\pi}{\operatorname{max}} E[V^*(s', T-1)]$$ At a given state \(s\), choosing an action \(a\) that correspond to the maximum Q-value is also known as Receding-horizon Optimal Control, with: $$\pi^*(s, T) = \underset{a}{\operatorname{argmax}}(Q^*(s, a, T))$$ ## Hindsight OpitimizationTheoretically, computing true value function/Q-value function may be achieved through value/policy (Q-value) iteration algorithms, but they are sometimes intractable for practical problems. The hindsight optimization (HOP) algorithm attempts to approximate the true value function with: $$V_{hs}(s, T) = E[\underset{\pi}{\operatorname{max}} R(s, F, \pi)]$$ Correspondingly, the Q-value can be found as: $$Q_{hs}(s, a, T) = r(s) + E[V_{hs}(s', T-1)]$$ with HOP policy: $$\pi_{hs}(s, T) = \underset{a}{\operatorname{argmax}}(Q_{hs}(s, a, T))$$
0 Comments
## Leave a Reply. |
## AuthorWrite something about yourself. No need to be fancy, just an overview. ## Archives## Categories |