R42  
Variable Independence in Markov Decision Problems
Irina Rish (irinar@ics.uci.edu) &
Rina Dechter (dechter@ics.uci.edu)

Abstract In decisiontheoretic planning, the problem of planning under uncertainty is formulated as a multidimensional, or factored MDP. Traditional dynamic programming techniques are inecient for solving factored MDPs whose state and action spaces are exponential in the number of the state and action variables, correspondingly. We focus on exploiting problems' structure imposed by variable independence that implies decomposability of transitional probabilities, rewards, and policies, and is captured by theinteraction graph of an MDP, obtained from its in uence diagram. Using the framework of bucket elimination[9], we formulate a variable elimination algorithm elimmeuid for computing maximum expected utility, given an inuence diagram, and apply it to MDPs. Traditional dynamic programming techniques for solving finite and infinitehorizon MDPs, such asbackward induction, value iteration, andpolicy iteration, can be also viewed as bucket elimination algorithms applied to a particular ordering of the state and decision variables. The time and space complexity of elimination algorithms is O(exp(w_{o}^{*})), where w_{o}^{*} the induced width of the interaction graph along the ordering o of its nodes. Unifying framework of bucket elimination makes complexity analysis and variable ordering heuristics developed in constraintbased and probabilistic reasoning applicable to decisiontheoretic planning. As we show, selecting "good" orderings improves the efficiency of traditional MDP algorithms. [ps] [pdf] 