Sampling Algorithms for Probabilistic Graphical Models with Determinism.

Vibhav Gogate

Mixed constraint and probabilistic graphical models occur quite frequently in many real world applications. Examples include: genetic linkage analysis, functional/software verification, target tracking and activity modeling. Query answering and in particular probabilistic inference on such graphical models is computationally hard often requiring exponential time in the worst case. Therefore in practice sampling algorithms are widely used for providing an approximate answer. In presence of deterministic dependencies or hard constraints, however, sampling has to overcome some principal challenges. In particular, importance sampling type schemes suffer from what is known as the rejection problem in that samples having zero weight may be generated with probability arbitrarily close to one yielding useless results. On the other hand, Markov Chain Monte Carlo techniques do not converge at all often yielding highly inaccurate estimates.

In this thesis, we address these problems in a two fold manner. First, we utilize research done in constraint satisfaction and satisfiability communities for processing constraints to reduce or eliminate rejection. Second, mindful of the time overhead in sample generation due to determinism, we both make and utilize advances in statistical estimation theory to make the "most" out of the generated samples.

Utilizing constraint satisfaction and satisfiability research, we propose two classes of sampling algorithms - one based on consistency enforcement and the other based on systematic search. The consistency enforcement class of algorithms work by shrinking the domains of random variables, by strengthening constraints, or by creating new ones, so that some or all zeros in the problem space can be removed. This improves convergence because of dimensionality reduction and also reduces rejection because many zero weight samples will not be generated. Our systematic search based techniques called SampleSearch manage the rejection problem by interleaving sampling with backtracking search. In this scheme, when a sample is supposed to be rejected, the algorithm continues instead with systematic backtracking search until a strictly positive-weight sample is generated. The strength of this scheme is that any state-of-the-art constraint satisfaction or propositional satisfiability search algorithm can be used with minor modifications. Through large scale experimental evaluation, we show that SampleSearch outperforms all state-of-the-art schemes when a significant amount of determinism is present in the graphical model. Subsequently, we combine SampleSearch with known statistical techniques such as Sampling Importance Resampling and Metropolis Hastings yielding efficient algorithms for sampling solutions from a uniform distribution over the solutions of a Boolean satisfiability formula. Unlike state-of-the-art algorithms, our SampleSearch-based algorithms guarantee convergence in the limit.

As to statistical estimation, we make two distinct contributions. First, we propose several new statistical inequalities extending the one-sample Markov inequality to multiple samples which can be used in conjunction with SampleSearch to probabilistically lower bound likelihood tasks over mixed networks. Second, we present a novel framework called AND/OR importance sampling which generalizes the process of computing sample mean by exploiting AND/OR search spaces for graphical models. Specifically we provide a spectrum of AND/OR sample means which are defined on the same set of samples but derive different estimates trading variance with time. At one end is the AND/OR sample tree mean which has smaller variance than the conventional OR sample tree mean and has the same time complexity. At the other end is the AND/OR graph sample mean which has even lower variance but has higher time and space complexity. We demonstrate empirically that AND/OR sample means are far closer to the exact answer than the conventional OR sample mean.