Variable Independence in Markov Decision ProblemsIrina Rish (firstname.lastname@example.org) & Rina Dechter (email@example.com)
In decision-theoretic planning, the problem of planning under uncertainty is formulated as a multidimensional, or factored MDP. Traditional dynamic programming techniques are inecient for solving factored MDPs whose state and action spaces are exponential in the number of the state and action variables, correspondingly. We focus on exploiting problems' structure imposed by variable independence that implies decomposability of transitional probabilities, rewards, and policies, and is captured by theinteraction graph of an MDP, obtained from its in uence diagram. Using the framework of bucket elimination, we formulate a variable elimination algorithm elim-meu-id for computing maximum expected utility, given an inuence diagram, and apply it to MDPs. Traditional dynamic programming techniques for solving finite and infinite-horizon MDPs, such asbackward induction, value iteration, andpolicy iteration, can be also viewed as bucket elimination algorithms applied to a particular ordering of the state and decision variables. The time and space complexity of elimination algorithms is O(exp(wo*)), where wo* the induced width of the interaction graph along the ordering o of its nodes. Unifying framework of bucket elimination makes complexity analysis and variable ordering heuristics developed in constraint-based and probabilistic reasoning applicable to decision-theoretic planning. As we show, selecting "good" orderings improves the efficiency of traditional MDP algorithms.