Abstracts
| A Graphical Model Representation of the Track-Oriented Multiple Hypothesis Tracker | |
The track-oriented multiple hypothesis tracker is currently the preferred
method for tracking multiple targets in clutter with medium to high
computational resources. This method maintains a structured representation of
the track posterior distribution, which it repeatedly extends and optimizes
over. This representation of the posterior admits probabilistic inference
tasks beyond MAP estimation that have yet to be explored. To this end we
formulate the posterior as a graphical model and show that belief propagation
can be used to approximate the track marginals. These approximate marginals
enable an online parameter estimation scheme that improves tracker performance
in the presence of parameter misspecification.
| Join-graph based cost-shifting schemes | |
We develop several algorithms taking advantage of two common approaches for
bounding MPE queries in graphical models: mini-bucket elimination and
message-passing updates for linear programming relaxations. Both methods are
quite similar, and offer useful perspectives for the other; our hybrid
approaches attempt to balance the advantages of each. We demonstrate the power
of our hybrid algorithms through extensive empirical evaluation. Most notably,
a Branch and Bound search guided by the heuristic function calculated by one of
our new algorithms has recently won first place in the PASCAL2 inference
challenge.
| A Cluster-Cumulant Expansion at the Fixed Points of Belief Propagation | |
We introduce a new cluster-cumulant expansion (CCE) based on the fixed points
of iterative belief propagation (IBP). This expansion is similar in spirit to
the loop-series (LS) recently introduced in Chertkov and Chernyak (2006). However,
in contrast to the latter, the CCE enjoys the following important qualities: 1)
it is defined for arbitrary state spaces 2) it is easily extended to fixed
points of generalized belief propagation (GBP), 3) disconnected groups of
variables will not contribute to the CCE and 4) the accuracy of the expansion
empirically improves upon that of the LS. The CCE is based on the same
Mobius transform as the Kikuchi approximation, but unlike GBP does not
require storing the beliefs of the GBP-clusters nor does it suffer from
convergence issues during belief updating.
| Belief Propagation for Structured Decision Making | |
Variational inference algorithms such as belief propagation have had tremendous
impact on our ability to learn and use graphical models, and give many insights
for developing or understanding exact and approximate inference. However,
variational approaches have not been widely adoped for decision making
in graphical models, often formulated through influence diagrams and including
both centralized and decentralized (or multi-agent) decisions. In this work,
we present a general variational framework for solving structured cooperative decision-making
problems, use it to propose several belief propagation-like algorithms, and
analyze them both theoretically and empirically.
| Approximating the Sum Operation for Marginal-MAP Inference | |
We study the marginal-MAP problem on graphical models, and present a novel
approximation method based on direct approximation of the sum operation. A
primary difficulty of marginal-MAP problems lies in the non-commutativity of
the sum and max operations, so that even in highly structured models,
marginalization may produce a densely connected graph over the variables to be
maximized, resulting in an intractable potential function with exponential
size. We propose a chain decomposition approach for summing over the
marginalized variables, in which we produce a structured approximation to the
MAP component of the problem consisting of only pairwise potentials. We show
that this approach is equivalent to the maximization of a specific variational
free energy, and it provides an upper bound of the optimal probability.
Finally, experimental results demonstrate that our method performs favorably
compared to previous methods.
| Distributed Parameter Estimation via Pseudo-likelihood | |
Estimating statistical models within sensor networks requires distributed
algorithms, in which both data and computation are distributed across the nodes
of the network. We propose a general approach for distributed learning based on
combining local estimators defined by pseudo-likelihood components,
encompassing a number of combination methods, and provide both theoretical and
experimental analysis. We show that simple linear combination or max-voting
methods, when combined with second-order information, are statistically
competitive with more advanced and costly joint optimization. Our algorithms
have many attractive properties including low communication and computational
cost and "any-time" behavior.
| Mini-bucket Elimination with Moment Matching | |
We investigate a hybrid of two styles of algorithms for deriving bounds for
optimization tasks over graphical models: non-iterative message-passing schemes
exploiting variable duplication to reduce cluster sizes (e.g. MBE) and iterative
methods that re-parameterize the problem's functions aiming to produce good
bounds even if functions are processed independently (e.g. MPLP). In this work
we combine both ideas, augmenting MBE with re-parameterization, which we call
MBE with Moment Matching (MBE-MM). The results of preliminary empirical
evaluations show the clear promise of the hybrid scheme over its individual com-
ponents (e.g., pure MBE and pure MPLP). Most significantly, we demonstrate the
potential of the new bounds in improving the power of mechanically generated
heuristics for branch and bound search.
| Adaptive Exact Inference in Graphical Models | |
Many algorithms and applications involve repeatedly solving
variations of the same inference problem, for example to introduce
new evidence to the model or to change conditional dependencies. As
the model is updated, the goal of \emph{adaptive inference} is to
take advantage of previously computed quantities to perform
inference more rapidly than from scratch. In this paper, we present
algorithms for adaptive exact inference on general graphs that can
be used to efficiently compute marginals and update MAP
configurations under arbitrary changes to the input factor graph and
its associated elimination tree. After a linear time preprocessing
step, our approach enables updates to the model and the computation
of any marginal in time that is logarithmic in the size of the input
model. Moreover, in contrast to max-product our approach can also be
used to update MAP configurations in time that is roughly
proportional to the number of updated entries, rather than the size
of the input model.
To evaluate the practical effectiveness of our
algorithms, we implement and test them using synthetic data as well as
for two real-world computational biology applications. Our experiments show that
adaptive inference can achieve substantial speedups over performing
complete inference as the model undergoes small changes over time.
| Bounding the Partition Function using Holder's Inequality | |
We describe an algorithm for approximate inference in graphical models based on
H\"older's inequality that provides upper and lower bounds on common summation
problems such as computing the partition function or probability of evidence
in a graphical model.
Our algorithm unifies and extends several existing approaches,
including variable elimination techniques such as mini-bucket elimination and
variational methods such tree reweighted belief propagation and
conditional entropy decomposition.
We show that our method inherits benefits from each approach
to provide significantly better bounds on sum-product tasks.
| Fast parallel and adaptive updates for dual-decomposition solvers | |
Dual-decomposition (DD) methods are quickly becoming important tools
for estimating the minimum energy state of a graphical model. DD
methods decompose a complex model into a collection of simpler
subproblems that can be solved exactly (such as trees), that in
combination provide upper and lower bounds on the exact solution.
Subproblem choice can play a major role:
larger subproblems tend to improve the bound more per iteration,
while smaller subproblems enable highly parallel solvers and can
benefit from re-using past solutions when there are few changes
between iterations.
We propose an algorithm that can balance many of these aspects to
speed up convergence. Our method uses a cluster tree
data structure that has been proposed for adaptive exact inference
tasks, and we apply it in this paper to dual-decomposition
approximate inference. This approach allows us to process large
subproblems to improve the bounds at each iteration, while allowing
a high degree of parallelizability and taking advantage of
subproblems with sparse updates.
For both synthetic inputs and a real-world stereo matching problem,
we demonstrate that our algorithm is able to achieve significant
improvement in convergence time.
| Tightening MRF relaxations with planar subproblems | |
We describe a new technique for computing lower-bounds on the minimum energy
configuration of a planar Markov Random Field (MRF). Our method successively
adds large numbers of constraints and enforces consistency over binary
projections of the original problem state space. These constraints are
represented in terms of subproblems in a dual-decomposition framework that is
optimized using subgradient techniques. The complete set of constraints we
consider enforces cycle consistency over the original graph. In practice we
find that the method converges quickly on most problems with the addition of a
few subproblems and outperforms existing methods for some interesting classes
of hard potentials.
| Planar cycle covering graphs | |
We describe a new variational lower-bound on the minimum energy configuration
of a planar binary Markov Random Field (MRF). Our method is based on adding
auxiliary nodes to every face of a planar embedding of the graph in order to
capture the effect of unary potentials. A ground state of the resulting
approximation can be computed efficiently by reduction to minimum-weight
perfect matching. We show that optimization of variational parameters achieves
the same lower-bound as dual-decomposition into the set of all cycles of the
original graph. We demonstrate that our variational optimization converges
quickly and provides high-quality solutions to hard combinatorial problems
10-100x faster than competing algorithms that optimize the same bound.
| Variational algorithms for marginal MAP | |
Marginal MAP problems are notoriously difficult tasks for graphical models. We
derive a general variational framework for solving marginal MAP problems,
in which we apply analogues of the Bethe, tree-reweighted, and mean field approximations.
We then derive a ``mixed" message passing algorithm and a convergent
alternative using CCCP to solve the BP-type approximations. Theoretically, we
give conditions under which the decoded solution is a global or local optimum,
and obtain novel upper bounds on solutions.
Experimentally we demonstrate that our algorithms outperform related approaches.
We also show that EM and variational EM comprise
a special case of our framework.
| Fault Detection via Nonparametric Belief Propagation | |
We consider the problem of identifying a pattern of faults from a set of noisy
linear measurements. Unfortunately, maximum a posteriori probability
estimation of the fault pattern is computationally intractable. To solve the
fault identification problem, we propose a non-parametric belief propagation
approach. We show empirically that our belief propagation solver is more
accurate than recent state-of-the-art algorithms including interior point
methods and semidefinite programming. Our superior performance is explained by
the fact that we take into account both the binary nature of the individual
faults and the sparsity of the fault pattern arising from their rarity.
| Revisiting MAP Estimation, Message Passing and Perfect Graphs | |
Given a graphical model, one of the most useful queries is to find the most
likely configuration of its variables. This task, known as the maximum a
posteriori (MAP) problem, can be solved efficiently via message passing
techniques when the graph is a tree, but is NP-hard for general graphs.
Jebara (2009) shows that the MAP problem can be converted into the
stable set problem, which can be solved in polynomial time for a broad class of
graphs known as perfect graphs via a linear programming relaxation technique.
This is a result of great theoretical interest. However, the article
additionally claims that max-product linear programming (MPLP) message passing
techniques of Globerson & Jaakkola (2007) are also guaranteed to solve these
problems exactly and efficiently. We investigate this claim, show that it does
not hold in general, and attempt to repair it with several alternative message
passing algorithms.
| Multicore Gibbs Sampling in Dense, Unstructured Graphs | |
Multicore computing is on the rise, but algorithms such as Gibbs sampling are
fundamentally sequential and may require close consideration to be made
parallel. Existing techniques either exploit sparse problem structure or make
approximations to the algorithm; in this work, we explore an alternative to
these ideas. We develop a parallel Gibbs sampling algorithm for shared-memory
systems that does not require any independence structure among the variables
yet does not approximate the sampling distributions. Our method uses a
look-ahead sampler, which uses bounds to attempt to sample variables before the
results of other threads are made available. We demonstrate our algorithm on
Gibbs sampling in Boltzmann machines and latent Dirichlet allocation (LDA). We
show in experiments that our algorithm achieves near linear speed-up in the
number of cores, is faster than existing exact samplers, and is nearly as fast
as approximate samplers while maintaining the correct stationary distribution.
| Learning Scale Free Networks by Reweighted L1 regularization | |
Methods for L1-type regularization have been widely used in Gaussian
graphical model selection tasks to encourage sparse structures. However, often
we would like to include more structural information than mere sparsity. In
this work, we focus on learning so-called ``scale-free'' models, a common
feature that appears in many real-work networks. We replace the L1
regularization with a power law regularization and optimize the objective
function by a sequence of iteratively reweighted L1 regularization
problems, where the regularization coefficients of nodes with high degree are
reduced, encouraging the appearance of hubs with high degree. Our method can be
easily adapted to improve any existing L1-based methods, such as
graphical lasso, neighborhood selection, and JSRM when the underlying networks
are believed to be scale free or have dominating hubs. We demonstrate in
simulation that our method significantly outperforms the a baseline L1
method at learning scale-free networks and hub networks, and also illustrate
its behavior on gene expression data.
| Understanding Errors in Approximate Distributed Latent Dirichlet Allocation | |
Latent Dirichlet allocation (LDA) is a popular algorithm for discovering semantic structure in
large collections of text or other data. Although its complexity is linear in the data
size, its use on increasingly massive collections has created considerable interest in
parallel implementations. ``Approximate distributed'' LDA, or AD-LDA,
approximates the popular collapsed Gibbs sampling algorithm for LDA models while
running on a distributed architecture. Although this algorithm often appears to perform well
in practice, its quality is not well understood theoretically or easily assessed on new data. In this work, we
theoretically justify the approximation, and modify AD-LDA to track an
error bound on performance. Specifically, we upper-bound the probability of making
a sampling error at each step of the algorithm (compared to an exact, sequential Gibbs sampler),
given the samples drawn thus far. We show empirically that our bound is sufficiently tight to give
a meaningful
and intuitive measure of approximation error in AD-LDA, allowing the user to track the
trade-off between accuracy and efficiency while executing in parallel.
[ BibTex ] | [ PDF ] | [ Link ]
| Nonparametric Belief Propagation | |
Continuous quantities are ubiquitous in models of real-world phenomena,
but are surprisingly difficult to reason about automatically. Probabilistic
graphical models such as Bayesian networks and Markov random fields,
and algorithms for approximate inference such as belief propagation,
have proven to be powerful
tools in a wide range of applications in statistics and artificial intelligence.
However, applying these methods to models with continuous variables remains
a challenging task. In this work we describe an extension of belief propagation
to continuous variable models, generalizing particle filtering and Gaussian mixture
filtering techniques for time series to more complex models. We illustrate the
power of the resulting nonparametric belief propagation algorithm via two applications:
kinematic tracking of visual motion and distributed localization in sensor networks.
| Negative Tree-reweighted Belief Propagation | |
We introduce a new class of lower bounds on the log partition function of a
Markov random field which makes use of a reversed Jensen's inequality.
In particular, our method approximates the intractable distribution using
a linear combination of spanning trees with negative weights. This technique
is a lower-bound counterpart to the tree-reweighted belief propagation algorithm,
which uses a convex combination of spanning trees with positive weights to
provide corresponding upper bounds. We develop algorithms to optimize and
tighten the lower bounds over the non-convex set of valid parameter values.
Our algorithm generalizes mean field approaches (including na\"ive and
structured mean field approximations), which it includes as a limiting case.
| Covering Trees and Lower Bounds on Quadratic Assignment | |
Many computer vision problems involving feature correspondence among images can
be formulated as an assignment problem with a quadratic cost function. Such
problems are computationally infeasible in general but recent advances in
discrete optimization such as tree-reweighted belief propagation (TRW)
often provide high-quality solutions. In this paper, we improve upon
these algorithms in two ways. First, we introduce covering trees, a variant of
TRW which provide the same bounds on the MAP energy as TRW with far fewer
variational parameters. Optimization of these parameters can be carried out
efficiently using either fixed--point iterations (as in TRW) or sub-gradient
based techniques. Second, we introduce a new technique that utilizes bipartite
matching applied to the min-marginals produced with covering trees in order to
compute a tighter lower-bound for the quadratic assignment problem. We apply
this machinery to the problem of finding correspondences with pairwise energy
functions, and demonstrate the resulting hybrid method outperforms TRW alone
and a recent related subproblem decomposition algorithm on benchmark image
correspondence problems.
| Particle Filtered MCMC-MLE with Connections to Contrastive Divergence | |
Learning undirected graphical models such as Markov random fields is an
important machine learning task with applications in many domains.
Since it is usually intractable to learn these models exactly, various
approximate learning techniques have been developed, such as contrastive
divergence (CD) and Markov chain Monte Carlo maximum likelihood estimation
(MCMC-MLE). In this paper, we introduce particle filtered MCMC-MLE, which is
a sampling-importance-resampling version of MCMC-MLE with
additional MCMC rejuvenation steps.
We also describe a unified view of (1) MCMC-MLE, (2) our particle filtering approach,
and (3) a stochastic approximation procedure known as persistent contrastive
divergence. We show how these approaches are related to each other
and discuss the relative merits of each approach.
Empirical results on various undirected models demonstrate that
the particle filtering technique we propose in this paper
can significantly outperform MCMC-MLE. Furthermore, in certain
cases, the proposed technique is faster than persistent CD.
| Learning with Blocks: Composite Likelihood and Contrastive Divergence | |
Composite likelihood methods provide a wide spectrum of computationally efficient
techniques for statistical tasks such as parameter estimation and model selection.
In this paper, we present a formal connection between the optimization of composite
likelihoods and the well-known contrastive divergence algorithm. In particular, we
show that composite likelihoods can be stochastically optimized by performing a
variant of contrastive divergence with random-scan blocked Gibbs sampling. By using
higher-order composite likelihoods, our proposed learning framework makes it
possible to trade off computation time for increased accuracy. Furthermore, one can
choose composite likelihood blocks that match the model's dependence structure,
making the optimization of higher-order composite likelihoods computationally
efficient. We empirically analyze the performance of blocked contrastive
divergence on various models, including visible Boltzmann machines, conditional
random fields, and exponential random graph models, and we demonstrate that using
higher-order blocks improves both the accuracy of parameter estimates and the rate
of convergence.
| Estimating Replicate Time-Shifts Using Gaussian Process Regression | |
Motivation: Time-course gene expression datasets provide important insights into dynamic aspects of biological processes, such as circadian rhythms, cell cycle and organ development. In a typical microarray time-course experiment, measurements are obtained at each time point from multiple replicate samples. Accurately recovering the gene expression patterns from experimental observations is made challenging by both measurement noise and variation among replicates' rates of development. Prior work on this topic has focused on inference of expression patterns assuming that the replicate times are synchronized. We develop a statistical approach that simultaneously infers both (i) the underlying (hidden) expression profile for each gene, as well as (ii) the biological time for each individual replicate. Our approach is based on Gaussian process regression (GPR) combined with a probabilistic model that accounts for uncertainty about the biological development time of each replicate.
Results: We apply GPR with uncertain measurement times to a microarray dataset of mRNA expression for the hair-growth cycle in mouse back skin, predicting both profile shapes and biological times for each replicate. The predicted time shifts show high consistency with independently obtained morphological estimates of relative development. We also show that the method systematically reduces prediction error on out-of-sample data, significantly reducing the mean squared error in a cross-validation study.
Availability: Matlab code for GPR with uncertain time shifts is available at http://sli.ics.uci.edu/Code/GPRTimeshift/
Contact: ihler@ics.uci.edu
| Particle-Based Variational Inference for Continuous Systems | |
Since the development of loopy belief propagation, there has been considerable
work on advancing the state of the art for approximate inference over
distributions defined on discrete random variables. Improvements include
guarantees of convergence, approximations that are provably more accurate, and
bounds on the results of exact inference. However, extending these methods to
continuous-valued systems has lagged behind. While several methods have been
developed to use belief propagation on systems with continuous values, recent
advances for discrete variables have not as yet been incorporated.
In this context we extend a recently proposed particle-based belief propagation
algorithm to provide a general framework for adapting discrete message-passing
algorithms to inference in continuous systems. The resulting
algorithms behave similarly to their purely discrete counterparts, extending
the benefits of these more advanced inference techniques to the continuous
domain.
| Bayesian detection of non-sinusoidal periodic patterns in circadian expression data | |
Motivation: Cyclical biological processes such as cell division and circadian regulation produce coordinated periodic expression of thousands of genes. Identification of such genes and their expression patterns is a crucial step in discovering underlying regulatory mechanisms. Existing computational methods are biased toward discovering genes that follow sine-wave patterns.
Results: We present an analysis of variance (ANOVA) periodicity detector and its Bayesian extension that can be used to discover periodic transcripts of arbitrary shapes from replicated gene expression profiles. The models are applicable when the profiles are collected at comparable time points for at least two cycles. We provide an empirical Bayes procedure for estimating parameters of the prior distributions and derive closed-form expressions for the posterior probability of periodicity, enabling efficient computation. The model is applied to two datasets profiling circadian regulation in murine liver and skeletal muscle, revealing a substantial number of previously undetected non-sinusoidal periodic transcripts in each. We also apply quantitative real-time PCR to several highly ranked non-sinusoidal transcripts in liver tissue found by the model, providing independent evidence of circadian regulation of these genes.
Availability: MATLAB software for estimating prior distributions and performing inference is available for download from http://www.datalab.uci.edu/resources/periodicity/.
Contact: dchudova@gmail.com
| Bounding Sample Errors in Approximate Distributed Latent Dirichlet Allocation | |
Latent Dirichlet allocation (LDA) is a popular algorithm for discovering structure in
large collections of text or other data. Although its complexity is linear in the data
size, its use on increasingly massive collections has created considerable interest in
parallel implementations. ``Approximate distributed'' LDA, or AD-LDA,
approximates the popular collapsed Gibbs sampling algorithm for LDA models while
running on a distributed architecture. Although this algorithm often appears to perform well
in practice, its quality is not well understood or easily assessed. In this work, we
provide some theoretical justification of the algorithm, and modify AD-LDA to track an
error bound on its performance. Specifically, we upper-bound the probability of making
a sampling error at each step of the algorithm (compared to an exact, sequential Gibbs sampler),
given the samples drawn thus far. We show empirically that our bound is sufficiently tight to give
a meaningful
and intuitive measure of approximation error in AD-LDA, allowing the user to understand the
trade-off between accuracy and efficiency.
| Adaptive Updates for MAP Configurations with Applications to Bioinformatics | |
Many applications involve repeatedly computing the optimal, maximum
a posteriori (MAP) configuration of a graphical model as the
model changes, often slowly or incrementally over time, e.g., due to
input from a user.
Small changes to the model often require
updating only a small fraction of the MAP configuration, suggesting
the possibility of performing updates faster than recomputing from
scratch.
In this paper we present an algorithm for efficiently performing such
updates under arbitrary changes to the model.
Our algorithm is within a logarithmic factor of the optimal and is
asymptotically never slower than re-computing from-scratch: if a
modification to the model requires $m$ updates to the MAP
configuration of $n$ random variables, then our algorithm requires
$O(m\log{(n/m)})$ time; re-computing from scratch requires $O(n)$
time.
We evaluate the practical effectiveness of our algorithm by
considering two problems in genomic signal processing, CpG region
segmentation and protein sidechain packing, where a MAP configuration
must be repeatedly updated. Our results show significant speedups
over recomputing from scratch.
| A Low Density Lattice Decoder via Non-parametric Belief Propagation | |
The recent work of Sommer, Feder and Shalvi presented a new family of codes called low density lattice codes (LDLC)
that can be decoded efficiently and approach the capacity of the AWGN channel. A linear time iterative decoding
scheme which is based on a message-passing formulation on a factor graph is given.
In the current work we report our theoretical findings regarding the relation between the LDLC decoder and belief propagation. We show that the LDLC decoder is an instance of non-parametric belief propagation and further connect it to the Gaussian belief propagation algorithm. Our new results enable borrowing knowledge from the non-parametric and Gaussian belief propagation domains into the LDLC domain. Specifically, we give more general convergence conditions for convergence of the LDLC decoder (under the same assumptions of the original LDLC convergence analysis). We discuss how to extend the LDLC decoder from Latin square to full rank, non-square matrices. We propose an efficient construction of sparse generator matrix and its matching decoder. We report preliminary experimental results which show our decoder has comparable symbol to error rate compared to the original LDLC decoder.
| Circadian Clock Genes Contribute to the Regulation of Hair Follicle Cycling | |
The hair follicle renews itself by repeatedly cycling among growth, regression,
and rest phases. One function of hair follicle cycling is to allow seasonal
changes in hair growth. Understanding the regulation of hair follicle cycling
is also of interest because abnormal regulation of hair cycle control genes is
responsible for several types of human hair growth disorders and skin cancers.
We report here that Clock and Bmal1 genes,
which control circadian rhythms, are also important for the regulation of hair
follicle cycling, a biological process of much longer duration than 24 hours.
Detailed analysis of skin from mice mutated for central clock genes indicates a
significant delay in the progression of the hair growth phase. We show that
clock genes affect the expression of key cell cycle control genes and that
keratinocytes in a critical compartment of the hair follicles in
Bmal1 mutant mice are halted in the G1 phase of the cell
cycle. These findings provide novel insight into circadian control mechanisms
in modulating the progression of cyclic biological processes on different time
scales.
[ BibTex ] | [ PDF ] | [ Link ]
| Particle Belief Propagation | |
The popularity of particle filtering for inference in Markov chain
models defined over random variables with very large or continuous
domains makes it natural to consider sample--based versions of belief
propagation (BP) for more general (tree--structured or loopy) graphs.
Already, several such algorithms have been proposed in the
literature. However, many questions remain open about the behavior
of particle--based BP algorithms, and little theory
has been developed to analyze their performance.
In this paper, we describe a generic particle belief propagation (PBP)
algorithm which is closely related to previously proposed methods.
We prove that this algorithm is consistent, approaching the true BP
messages as the number of samples grows large.
We then use concentration bounds to analyze the finite-sample
behavior and give a convergence rate for the algorithm on
tree--structured graphs. Our convergence rate is $O(1/\sqrt{n})$
where $n$ is the number of samples, independent of the domain size
of the variables.
| Fast Collapsed Gibbs Sampling for Latent Dirichlet Allocation | |
In this paper we introduce a novel collapsed Gibbs sampling method for the
widely used latent Dirichlet allocation (LDA) model. Our
new method results in significant speedups on real world text
corpora. Conventional Gibbs sampling schemes for LDA require
O(K) operations per sample where K is the number of
topics in the model. Our proposed method draws equivalent
samples but requires on average significantly less then K operations per sample.
On real-word corpora FastLDA can be as much as 8
times faster than the standard collapsed Gibbs sampler for LDA. No approximations are
necessary, and we show that our fast sampling
scheme produces exactly the same results as the standard
(but slower) sampling scheme. Experiments on four real world
data sets demonstrate speedups for a wide range of
collection sizes. For the PubMed collection of over 8 million
documents with a required computation time of 6 CPU months for LDA,
our speedup of 5.7 can save 5 CPU months of computation.
| Probabilistic Analysis of a Large Scale Urban Traffic Sensor Data Set | |
Real-world sensor time series are often significantly noisier
and more difficult to work with than the relatively clean
data sets that tend to be used as the basis for experiments
in many research papers. In this paper we report on a large
case-study involving statistical data mining of over 300 million
measurements from 1700 freeway traffic sensors over a
period of seven months in Southern California. We discuss
the challenges posed by the wide variety of different sensor
failures and anomalies present in the data. The volume and
complexity of the data precludes the use of manual
visualization or simple thresholding techniques to identify these
anomalies. We describe the application of probabilistic
modeling and unsupervised learning techniques to this data set
and illustrate how these approaches can successfully detect
underlying systematic patterns even in the presence of
substantial noise and missing data
[ BibTex ] | [ PDF ] | [ Link ]
| Adaptive Inference in General Graphical Models | |
Many algorithms and applications involve repeatedly solving
variations of the same inference problem; for example we may want to
introduce new evidence to the model or perform updates to
conditional dependencies. The goal of \emph{adaptive inference} is
to take advantage of what is preserved in the model and perform
inference more rapidly than from scratch. In this paper, we
describe techniques for adaptive inference on general graphs that
support marginal computation and updates to the conditional
probabilities and dependencies in logarithmic time. We give
experimental results for an implementation of our algorithm, and
demonstrate its potential performance benefit in the study of
protein structure.
| Modeling Count Data from Multiple Sensors: A Building Occupancy Model | |
Knowledge of the number of people in a building at a given time is
crucial for applications such as emergency response. Sensors can be
used to gather noisy measurements which when combined, can be used
to make inferences about the location, movement and density of
people. In this paper we describe a probabilistic model for
predicting the occupancy of a building using networks of
people-counting sensors. This model provides robust predictions
given typical sensor noise as well as missing and corrupted data
from malfunctioning sensors. We experimentally validate the model
by comparing it to a baseline method using real data from a network
of optical counting sensors in a campus building.
| Adaptive Bayesian Inference | |
Motivated by stochastic systems in which observed evidence and
conditional dependencies between states of the network change over
time, and certain quantities of interest (marginal distributions,
likelihood estimates etc.) must be updated, we study the problem of
\emph{adaptive inference} in tree-structured Bayesian networks. We
describe an algorithm for adaptive inference that handles a broad
range of changes to the network and is able to maintain marginal
distributions, MAP estimates, and data likelihoods in all expected
logarithmic time. We give an implementation of our algorithm and
provide experiments that show that the algorithm can yield up to two
orders of magnitude speedups on answering queries and responding to
dynamic changes over the sum-product algorithm.
| Learning to detect events with Markov-modulated Poisson processes | |
Time-series of count data occur in many different contexts, including Internet navigation logs, freeway traffic monitoring, and security logs associated with buildings. In this article we describe a framework for detecting anomalous events in such data using an unsupervised learning approach. Normal periodic behavior is modeled via a time-varying Poisson process model, which in turn is modulated by a hidden Markov process that accounts for bursty events. We outline a Bayesian framework for learning the parameters of this model from count time-series. Two large real-world datasets of time-series counts are used as testbeds to validate the approach, consisting of freeway traffic data and logs of people entering and exiting a building. We show that the proposed model is significantly more accurate at detecting known events than a more traditional threshold-based technique. We also describe how the model can be used to investigate different degrees of periodicity in the data, including systematic day-of-week and time-of-day effects, and to make inferences about different aspects of events such as number of vehicles or people involved. The results indicate that the Markov-modulated Poisson framework provides a robust and accurate framework for adaptively and autonomously learning how to separate unusual bursty events from traces of normal human activity.
| Accuracy Bounds for Belief Propagation | |
The belief propagation algorithm is widely applied to perform
approximate inference on arbitrary graphical models, in part due to
its excellent empirical properties and performance. However, little
is known theoretically about when this algorithm will perform well.
Using recent analysis of convergence and stability properties in
belief propagation and new results on approximations in binary
systems, we derive a bound on the error in BP's estimates for
pairwise Markov random fields over discrete--valued random variables.
Our bound is relatively simple to compute, and compares favorably
with a previous method for bounding the accuracy of
belief propagation.
| Graphical Models for Statistical Inference and Data Assimilation | |
In data assimilation for a system which evolves in time, one combines past and current observations with a model of the dynamics of the system, in order to improve the simulation of the system as well as any future predictions about it. From a statistical point of view, this process can be regarded as estimating many random variables which are related both spatially and temporally: given observations of some of these variables, typically corresponding to times past, we require estimates of several others, typically corresponding to future times.
Graphical models have emerged as an effective formalism for assisting in these types of inference tasks, particularly for large numbers of
random variables. Graphical models provide a means of representing dependency structure among the variables, and can provide both intuition
and efficiency in estimation and other inference computations. We provide an overview and introduction to graphical models, and describe how
they can be used to represent statistical dependency and how the resulting structure can be used to organize computation. The relation between
statistical inference using graphical models and optimal sequential estimation algorithms such as Kalman filtering is discussed. We then give
several additional examples of how graphical models can be applied to climate dynamics, specifically estimation using multi-resolution models of
large-scale data sets such as satellite imagery, and learning hidden Markov models to capture rainfall patterns in space and time.
[ BibTex ] | [ PDF ] | [ Link ]
| Learning Time-Intensity Profiles of Human Activity Using Nonparametric Bayesian Models | |
Data sets that characterize human activity over time through collections of timestamped
events or counts are of increasing interest in application areas as humancomputer
interaction, video surveillance, and Web data analysis. We propose a
non-parametric Bayesian framework for modeling collections of such data. In
particular, we use a Dirichlet process framework for learning a set of intensity
functions corresponding to different categories, which form a basis set for representing
individual time-periods (e.g., several days) depending on which categories
the time-periods are assigned to. This allows the model to learn in a data-driven
fashion what factors are generating the observations on a particular day, including
(for example) weekday versus weekend effects or day-specific effects corresponding
to unique (single-day) occurrences of unusual behavior, sharing information
where appropriate to obtain improved estimates of the behavior associated
with each category. Applications to realworld data sets of count data involving
both vehicles and people are used to illustrate the technique.
| Adaptive Event Detection with Time-Varying Poisson Processes | |
Time-series of count data are generated in many different contexts, such as web access logging, freeway traffic monitoring, and security logs associated with buildings. Since this data measures the aggregated behavior of individual human beings, it typically exhibits a periodicity in time on a number of scales (daily, weekly,etc.) that reflects the rhythms of the underlying human activity and makes the data appear non-homogeneous. At the same time, the data is often corrupted by a number of bursty periods of unusual behavior such as building events, traffic accidents, and so forth. The data mining problem of finding and extracting these anomalous events is made difficult by both of these elements. In this paper we describe a framework for unsupervised learning in this context, based on a time-varying Poisson process model that can also account for anomalous events. We show how the parameters of this model can be learned from count time series using statistical estimation techniques. We demonstrate the utility of this model on two datasets for which we have partial ground truth in the form of known events, one from freeway traffic data and another from building access data, and show that the model performs significantly better than a non-probabilistic, threshold-based technique. We also describe how the model can be used to investigate different degrees of periodicity in the data, including systematic day-of-week and time-of-day effects, and make inferences about the detected events (e.g., popularity or level of attendance). Our experimental results indicate that the proposed time-varying Poisson model provides a robust and accurate framework for adaptively and autonomously learning how to separate unusual bursty events from traces of normal human activity.
| Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation | |
Nonparametric Bayesian approaches to clustering,
information retrieval, language modeling
and object recognition have recently shown great
promise as a new paradigm for unsupervised data
analysis. Most contributions have focused on the
Dirichlet process mixture models or extensions
thereof for which efficient Gibbs samplers exist.
In this paper we explore Gibbs samplers for
infinite complexity mixture models in the stick
breaking representation. The advantage of this
representation is improved modeling flexibility.
For instance, one can design the prior distribution
over cluster sizes or couple multiple infi-
nite mixture models (e.g., over time) at the level
of their parameters (i.e., the dependent Dirichlet
process model). However, Gibbs samplers for in-
finite mixture models (as recently introduced in
the statistics literature) seem to mix poorly over
cluster labels. Among others issues, this can have
the adverse effect that labels for the same cluster
in coupled mixture models are mixed up. We
introduce additional moves in these samplers to
improve mixing over cluster labels and to bring
clusters into correspondence. An application to
modeling of storm trajectories is used to illustrate
these ideas.
| Distributed Fusion in Sensor Networks | |
Distributed inference methods developed for graphical models comprise a principled
approach for data fusion in sensor networks. The application of these methods, however,
requires some care due to a number of issues that are particular to sensor networks.
Chief of among these are the distributed nature of computation and
deployment coupled with communications bandwidth and energy constraints typical
of many sensor networks. Additionally, information sharing in a sensor network necessarily
involves approximation. Traditional measures of distortion are not sufficient to characterize the
quality of approximation as they do not address in an explicit manner the resulting impact on inference
which is at the core of many data fusion problems. While both graphical models and a distributed
sensor network have network structures associated with them, the mapping is not one to one.
All of these issues complicate the mapping of a particular inference problem to a given sensor network
structure. Indeed, there may be a variety of mappings with very different characteristics with
regard to computational complexity and utilization of resources. Nevertheless, it is the case that
many of the powerful distributed inference methods have a role in information fusion for sensor
networks. In this article we present an overview of research conducted by the authors that has
sought to clarify many of the important issues at the intersection
of these domains. We discuss both theoretical issues and prototypical
applications in addition to suggesting new lines of reasoning.
| Particle Filtering Under Communications Constraints | |
Particle filtering is often applied to the problem of object tracking
under non-Gaussian uncertainty; however, sensor networks frequently
require that the implementation be local to the region of interest,
eventually forcing the large, sample-based representation to be moved
among power-constrained sensors. We consider the problem of successive
approximation (i.e., lossy compression) of each sample-based density
estimate, in particular exploring the consequences (both theoretical and
empirical) of several possible choices of loss function and their
interpretation in terms of future errors in inference, justifying their
use for measuring approximations in distributed particle filtering.
| Estimating Dependency and Significance for High-Dimensional Data | |
Understanding the dependency structure of a set of variables is a
key component in various signal processing applications which involve
data association. The simple task of detecting whether any
dependency exists is particularly difficult when models of the data
are unknown or difficult to characterize because of high-dimensional
measurements. We review the use of nonparametric tests for characterizing
dependency and how to carry out these tests with highdimensional
observations. In addition we present a method to assess
the significance of the tests.
| Nonparametric Belief Propagation for Sensor Network Self-Calibration | |
Automatic self-localization is a critical need for the effective use of
ad-hoc sensor networks in military or civilian applications. In general,
self-localization involves the combination of absolute location
information (\eg GPS) with relative calibration information (\eg distance
measurements between sensors) over regions of the network. Furthermore, it
is generally desirable to distribute the computational burden across the
network and minimize the amount of inter-sensor communication. We
demonstrate that the information used for sensor localization is
fundamentally local with regard to the network topology and use this
observation to reformulate the problem within a graphical model framework.
We then present and demonstrate the utility of \emph{nonparametric belief
propagation} (NBP), a recent generalization of particle filtering, for
both estimating sensor locations and representing location uncertainties.
NBP has the advantage that it is easily implemented in a distributed
fashion, admits a wide variety of statistical models, and can represent
multi-modal uncertainty. Using simulations of small- to moderately-sized
sensor networks, we show that NBP may be made robust to outlier
measurement errors by a simple model augmentation, and that judicious
message construction can result in better estimates. Furthermore, we
provide an analysis of NBP's communications requirements, showing that
typically only a few messages per sensor are required, and that even low
bit-rate approximations of these messages can have little or no
performance impact.
| Loopy Belief Propagation: Convergence and Effects of Message Errors | |
Belief propagation (BP) is an increasingly popular method of performing
approximate inference on arbitrary graphical models. At times, even
further approximations are required, whether due to quantization of the
messages or model parameters, from other simplified message or model
representations, or from stochastic approximation methods. The
introduction of such errors into the BP message computations has the
potential to affect the solution obtained adversely. We analyze the
effect resulting from message approximation under two particular measures
of error, and show bounds on the accumulation of errors in the system.
This analysis leads to convergence conditions for traditional BP message
passing, and both strict bounds and estimates of the resulting error in
systems of approximate BP message passing.
| Message Errors in Belief Propagation | |
Belief propagation (BP) is an increasingly popular method of performing
approximate inference on arbitrary graphical models. At times, even
further approximations are required, whether from quantization or other
simplified message representations or from stochastic approximation
methods. Introducing such errors into the BP message computations has
the potential to adversely affect the solution obtained. We analyze this
effect with respect to a particular measure of message error, and show
bounds on the accumulation of errors in the system. This leads both to
convergence conditions and error bounds in traditional and approximate BP
message passing.
| Nonparametric Hypothesis Tests for Statistical Dependency | |
Determining the structure of dependencies among a set of variables is a
common task in many signal and image processing applications, including
multi-target tracking and computer vision. In this paper we present an
information-theoretic, machine learning approach to problems of this type.
We cast this problem as a hypothesis test between factorizations of
variables into mutually independent subsets. We show that the likelihood
ratio can be written as sums of two sets of Kullback-Leibler (KL)
divergence terms. The first set captures the structure of the statistical
dependencies within each hypothesis, while the second set measures the
details of model differences between hypotheses. We then consider the
case when the signal prior models are unknown, so that the distributions
of interest must be estimated directly from data, showing that the second
set of terms is (asymptotically) negligible and quantifying the loss in
hypothesis separability when the models are completely unknown. We
demonstrate the utility of nonparametric estimation methods for such
problems, providing a general framework for determining and distinguishing
between dependency structures in highly uncertain environments.
Additionally, we develop a machine learning approach for estimating lower
bounds on KL-divergence and mutual information from samples of
high-dimensional random variables for which direct density estimation is
infeasible. We present empirical results in the context of three
prototypical applications: association of signals generated by sources
possessing harmonic behavior, scene correspondence using video imagery,
and detection of coherent behavior among sets of moving objects.
| Communications-Constrained Inference | |
In many applications, particularly power-constrained sensor networks, it is important to conserve the amount of data exchanged while maximizing the utility of that data for some inference task. Broadly, this tradeoff has two major cost components—the representation’s size (in distributed networks, the communications cost) and the error incurred by its use (the inference cost).
We analyze this tradeoff for a particular problem: communicating a particle-based representation (and more generally, a Gaussian mixture or kernel density estimate). We begin by characterizing the exact communication cost of these representations, noting that it is less than might be suggested by traditional communications theory due to the invariance of the represen- tation to reordering. We describe the optimal, lossless encoder when the generating distribution is known, and pose a sub-optimal encoder which still benefits from reordering invariance.
However, lossless encoding may not be sufficient. We describe one reasonable measure of
error for distribution-based messages and its consequences for inference in an acyclic network,
and propose a novel density approximation method based on KD-tree multiscale representations
which enables the communications cost and a bound on error to be balanced efficiently. We
show several empirical examples demonstrating the method’s utility in collaborative, distributed
signal processing under bandwidth or power constraints.
| Nonparametric Belief Propagation for Sensor Network Self-Calibration | |
Automatic self-calibration of ad-hoc sensor networks is a critical need
for their use in military or civilian applications. In general,
self-calibration involves the combination of absolute location information
(e.g. GPS) with relative calibration information (e.g. estimated distance
between sensors) over regions of the network. We formulate the
self-calibration problem as a graphical model, enabling application of
nonparametric belief propagation (NBP), a recent generalization of
particle filtering, for both estimating sensor locations and representing
location uncertainties. NBP has the advantage that it is easily
implemented in a distributed fashion, can represent multi-modal
uncertainty, and admits a wide variety of statistical models. This last
point is particularly appealing in that it can be used to provide
robustness against occasional high-variance (outlier) noise. We
illustrate the performance of NBP using Monte Carlo analysis on an example
network.
| Nonparametric Belief Propagation for Self-Calibration in Sensor Networks | |
Automatic self-calibration of ad-hoc sensor networks is a critical need
for their use in military or civilian applications. In general,
self-calibration involves the combination of absolute location information
(e.g. GPS) with relative calibration information (e.g. time delay or
received signal strength between sensors) over regions of the network.
Furthermore, it is generally desirable to distribute the computational
burden across the network and minimize the amount of inter-sensor
communication. We demonstrate that the information used for sensor
calibration is fundamentally local with regard to the network topology and
use this observation to reformulate the problem within a graphical model
framework. We then demonstrate the utility of \emph{nonparametric belief
propagation} (NBP), a recent generalization of particle filtering, for
both estimating sensor locations and representing location uncertainties.
NBP has the advantage that it is easily implemented in a distributed
fashion, admits a wide variety of statistical models, and can represent
multi-modal uncertainty. We illustrate the performance of NBP on several
example networks while comparing to a previously published nonlinear least
squares method.
| Efficient Multiscale Sampling from Products of Gaussian Mixtures | |
The problem of approximating the product of several Gaussian mixture
distributions arises in a number of contexts, including the nonparametric
belief propagation (NBP) inference algorithm and the training of
product of experts models.
This paper develops two multiscale algorithms for sampling from a
product of Gaussian mixtures, and compares their performance to existing
methods. The first is a multiscale variant of previously proposed
Monte Carlo techniques, with comparable
theoretical guarantees but improved empirical convergence rates.
The second makes use of approximate kernel density evaluation
methods to construct a fast approximate sampler, which is
guaranteed to sample points to within a tunable parameter $\epsilon$
of their true probability.
We compare both multiscale samplers on a set of computational examples
motivated by NBP, demonstrating significant improvements over existing
methods.
| Nonparametric Belief Propagation | |
In many applications of graphical models arising in computer vision, the hidden variables of interest
are most naturally specified by continuous, non-Gaussian distributions. There exist inference algorithms
for discrete approximations to these continuous distributions, but for the high-dimensional variables
typically of interest, discrete inference becomes infeasible. Stochastic methods such as particle
filters provide an appealing alternative. However, existing techniques fail to exploit the rich
structure of the graphical models describing many vision problems. Drawing on ideas from regularized
particle filters and belief propagation (BP), this paper develops a nonparametric belief propagation
(NBP) algorithm applicable to general graphs. Each NBP iteration uses an efficient sampling procedure
to update kernel-based approximations to the true, continuous likelihoods. The algorithm can accomodate
an extremely broad class of potential functions, including nonparametric representations. Thus, NBP
extends particle filtering methods to the more general vision problems that graphical models can
describe. We apply the NBP algorithm to infer component interrelationships in a parts-based face model,
allowing location and reconstruction of occluded features.
| Hypothesis Testing over Factorizations for Data Association | |
The issue of data association arises frequently in sensor
networks; whenever multiple sensors and sources are present, it
may be necessary to determine which observations from different
sensors correspond to the same target. In highly uncertain
environments, one may need to determine this correspondence
without the benefit of an \emph{a priori} known joint
signal/sensor model. This paper examines the data association
problem as the more general hypothesis test between factorizations
of a single, learned distribution. The optimal test between known
distributions may be decomposed into model-dependent and
statistical dependence terms, quantifying the cost incurred by
model estimation from measurements compared to a test between
known models. We demonstrate how one might evaluate a two-signal
association test efficiently using kernel density estimation
methods to model a wide class of possible distributions, and show
the resulting algorithm's ability to determine correspondence in
uncertain conditions through a series of synthetic examples. We
then describe an extension of this technique to multi-signal
association which can be used to determine correspondence while
avoiding the computationally prohibitive task of evaluating all
hypotheses. Empirical results of the approximate approach are
presented.
| Nonparametric Belief Propagation | |
In applications of graphical models arising in fields such as computer vision,
the hidden variables of interest are most naturally specified by continuous,
non-Gaussian distributions. However, due to the limitations of existing inference
algorithms, it is often necessary to form coarse, discrete approximations to
such models. In this paper, we develop a nonparametric belief propagation
(NBP) algorithm, which uses stochastic methods to propagate kernel-based approximations
to the true continuous messages. Each NBP message update is based on an efficient
sampling procedure which can accomodate an extremely broad class of potential
functions, allowing easy adaptation to new application areas. We validate our
method using comparisons to continuous BP for Gaussian networks, and an application
to the stereo vision problem.
| Nonparametric Estimators for Online Signature Authentication | |
We present extensions to our previous work in modelling dynamical processes.
The approach uses an information theoretic criterion for searching over
subspaces of the past observations, combined with a nonparametric density
characterizing its relation to one-step-ahead prediction and uncertainty.
We use this methodology to model handwriting stroke data, specifically
signatures, as a dynamical system and show that it is possible to learn
a model capturing their dynamics for use either in synthesizing realistic
signatures and in discriminating between signatures and forgeries even
though no forgeries have been used in constructing the model.
This novel approach yields promising results even for small training sets.
| Learning Informative Statistics: A Nonparametric Approach | |
We discuss an information theoretic approach for categorizing and
modeling dynamic processes. The approach can learn a compact and
informative statistic which summarizes past states to predict future
observations. Furthermore, the uncertainty of the prediction is
characterized nonparametrically by a joint density over the learned
statistic and present observation. We discuss the application of the
technique to both noise driven dynamical systems and random
processes sampled from a density which is conditioned on the
past. In the first case we show results in which both the dynamics
of random walk and the statistics of the driving noise are
captured. In the second case we present results in which a
summarizing statistic is learned on noisy random telegraph waves
with differing dependencies on past states. In both cases the
algorithm yields a principled approach for discriminating processes
with differing dynamics and/or dependencies. The method is grounded
in ideas from information theory and nonparametric statistics.
