Exploiting Graph Cutsets for Sampling-Based Approximations in Bayesian Networks
Submitted in partial satisfaction of the requirements for the degree of Doctor Of Philosophy in Information and Computer Science Bozhena Bidyuk
Automated reasoning with graphical models has found many practical applications in domains such as planning, vision, speech recognition, genetic linkage analysis, diagnostics, and many others. Graphical models, combining graph theory and probability theory, facilitate a compact and structured representation for problems with uncertainty and provide a mechanism for answering queries such as computing the probability of an event given observations.
Several exact algorithms for reasoning with graphical models exist. However, exact computation is not always possible due to prohibitive time and memory demands. In general, computing exact posterior marginals and even approximating posterior marginals within a desired degree of precision is NP-hard. In practice, we often choose methods that can quickly compute approximate answers to Bayesian queries, trading accuracy for speed. Approximation methods include algorithms for approximate inference, stochastic sampling, network simplifications (simplifying the structure of the underlying graph), and variational approximations. We often obtain a more flexible computation scheme, balancing complexity and accuracy, by combining exact and approximate computation. This dissertation focuses on combining search with existing sampling and bounding methods yielding two new schemes for approximating and bounding posterior marginals in Bayesian networks. Those two new schemes, cutset sampling and any-time bounds, exploit the network structure to bound the complexity of exact computation.
Cutset sampling for computing approximate posterior marginals samples only a subset of variables, a cutset of the underlying graph. Since reducing the size of the sampling set results in lower sampling variance, cutset-based sampling converges faster than sampling on a full set of variables. Two variants of cutset sampling algorithm were developed. One, based on Gibbs sampling, is a general approach to collapsed Gibbs sampling in Bayesian networks. The second algorithm implements the likelihood weighting on a cutset. The proposed any-time bounds framework is an any-time scheme for computing bounds on posterior marginals. It enumerates a subset of cutset tuples and performs exact inference over these tuples and then bounds the remaining probability mass.
Both methods exploit the problemís underlying network structure to control the time and space complexity of the computations. They focus on finding a cutset of the graph such that the complexity of exact reasoning is bounded when the cutset variables are assigned. The dissertation proposes a new algorithm for finding a minimum cost cutset that yields the specified complexity bound on exact inference.