Dr. Rina Dechter @ UCI

Abstract
In this paper we consider computational aspects of decision-theoretic planning modeled by Markov decision processes (MDPs). Commonly used algorithms, such as value iteration (VI) [Bellman, 1957] and several versions of modified policy iteration (MPI) [Puterman, 1994] (a modification of the original Howard's policy iteration (PI) [Howard, 1960]), are compared on a class of problems from the motion planning domain. Policy iteration and its modifications are usually recommended as algorithms demonstrating a better performance than value iteration [Russel & Norvig, 1995], [Puterman, 1994]. However, our results show that their performance is not always superior and depends on the parameters of a problem and the parameters of the algorithms, such as number of iterations in the value determination procedure in MPI. Moreover, policy iteration applied to non-discounted models without special restrictions might not even converge to an optimal policy, as in case of the policy iteration algorithm introduced in [Russel & Norvig, 1995]. We also introduce a new stopping criterion into value iteration based on policy changes. The combined value-policy iteration (CVPI) algorithm proposed in the paper implements this criterion and generates an optimal policy faster then both policy and value iteration algorithms.

R42a
Value Iteration And Policy Iteration Algorithms For Markov Decision Problem `Elena Pashenkova, Irina Rish (irinar@ics.uci.edu) & Rina Dechter (dechter@ics.uci.edu)`

Publications & Technical Reports