Abstracts

A Graphical Model Representation of the Track-Oriented Multiple Hypothesis Tracker

Frank, Smyth, Ihler

The track-oriented multiple hypothesis tracker is currently the preferred method for tracking multiple targets in clutter with medium to high computational resources. This method maintains a structured representation of the track posterior distribution, which it repeatedly extends and optimizes over. This representation of the posterior admits probabilistic inference tasks beyond MAP estimation that have yet to be explored. To this end we formulate the posterior as a graphical model and show that belief propagation can be used to approximate the track marginals. These approximate marginals enable an online parameter estimation scheme that improves tracker performance in the presence of parameter misspecification.

[ BibTex ] | [ PDF ]



Join-graph based cost-shifting schemes

Ihler, Flerova, Dechter, Otten

We develop several algorithms taking advantage of two common approaches for bounding MPE queries in graphical models: mini-bucket elimination and message-passing updates for linear programming relaxations. Both methods are quite similar, and offer useful perspectives for the other; our hybrid approaches attempt to balance the advantages of each. We demonstrate the power of our hybrid algorithms through extensive empirical evaluation. Most notably, a Branch and Bound search guided by the heuristic function calculated by one of our new algorithms has recently won first place in the PASCAL2 inference challenge.

[ BibTex ] | [ PDF ]



A Cluster-Cumulant Expansion at the Fixed Points of Belief Propagation

Welling, Gelfand, Ihler

We introduce a new cluster-cumulant expansion (CCE) based on the fixed points of iterative belief propagation (IBP). This expansion is similar in spirit to the loop-series (LS) recently introduced in Chertkov and Chernyak (2006). However, in contrast to the latter, the CCE enjoys the following important qualities: 1) it is defined for arbitrary state spaces 2) it is easily extended to fixed points of generalized belief propagation (GBP), 3) disconnected groups of variables will not contribute to the CCE and 4) the accuracy of the expansion empirically improves upon that of the LS. The CCE is based on the same Mobius transform as the Kikuchi approximation, but unlike GBP does not require storing the beliefs of the GBP-clusters nor does it suffer from convergence issues during belief updating.

[ BibTex ] | [ PDF ]



Belief Propagation for Structured Decision Making

Liu, Ihler

Variational inference algorithms such as belief propagation have had tremendous impact on our ability to learn and use graphical models, and give many insights for developing or understanding exact and approximate inference. However, variational approaches have not been widely adoped for decision making in graphical models, often formulated through influence diagrams and including both centralized and decentralized (or multi-agent) decisions. In this work, we present a general variational framework for solving structured cooperative decision-making problems, use it to propose several belief propagation-like algorithms, and analyze them both theoretically and empirically.

[ BibTex ] | [ PDF ]



Approximating the Sum Operation for Marginal-MAP Inference

Cheng, Chen, Dong, Xu, Ihler

We study the marginal-MAP problem on graphical models, and present a novel approximation method based on direct approximation of the sum operation. A primary difficulty of marginal-MAP problems lies in the non-commutativity of the sum and max operations, so that even in highly structured models, marginalization may produce a densely connected graph over the variables to be maximized, resulting in an intractable potential function with exponential size. We propose a chain decomposition approach for summing over the marginalized variables, in which we produce a structured approximation to the MAP component of the problem consisting of only pairwise potentials. We show that this approach is equivalent to the maximization of a specific variational free energy, and it provides an upper bound of the optimal probability. Finally, experimental results demonstrate that our method performs favorably compared to previous methods.

[ BibTex ] | [ PDF ]



Distributed Parameter Estimation via Pseudo-likelihood

Liu, Ihler

Estimating statistical models within sensor networks requires distributed algorithms, in which both data and computation are distributed across the nodes of the network. We propose a general approach for distributed learning based on combining local estimators defined by pseudo-likelihood components, encompassing a number of combination methods, and provide both theoretical and experimental analysis. We show that simple linear combination or max-voting methods, when combined with second-order information, are statistically competitive with more advanced and costly joint optimization. Our algorithms have many attractive properties including low communication and computational cost and "any-time" behavior.

[ BibTex ] | [ PDF ]



Mini-bucket Elimination with Moment Matching

Flerova, Ihler, Dechter, Otten

We investigate a hybrid of two styles of algorithms for deriving bounds for optimization tasks over graphical models: non-iterative message-passing schemes exploiting variable duplication to reduce cluster sizes (e.g. MBE) and iterative methods that re-parameterize the problem's functions aiming to produce good bounds even if functions are processed independently (e.g. MPLP). In this work we combine both ideas, augmenting MBE with re-parameterization, which we call MBE with Moment Matching (MBE-MM). The results of preliminary empirical evaluations show the clear promise of the hybrid scheme over its individual com- ponents (e.g., pure MBE and pure MPLP). Most significantly, we demonstrate the potential of the new bounds in improving the power of mechanically generated heuristics for branch and bound search.

[ BibTex ] | [ PDF ]



Adaptive Exact Inference in Graphical Models

Sumer, Acar, Ihler, Mettu

Many algorithms and applications involve repeatedly solving variations of the same inference problem, for example to introduce new evidence to the model or to change conditional dependencies. As the model is updated, the goal of \emph{adaptive inference} is to take advantage of previously computed quantities to perform inference more rapidly than from scratch. In this paper, we present algorithms for adaptive exact inference on general graphs that can be used to efficiently compute marginals and update MAP configurations under arbitrary changes to the input factor graph and its associated elimination tree. After a linear time preprocessing step, our approach enables updates to the model and the computation of any marginal in time that is logarithmic in the size of the input model. Moreover, in contrast to max-product our approach can also be used to update MAP configurations in time that is roughly proportional to the number of updated entries, rather than the size of the input model. To evaluate the practical effectiveness of our algorithms, we implement and test them using synthetic data as well as for two real-world computational biology applications. Our experiments show that adaptive inference can achieve substantial speedups over performing complete inference as the model undergoes small changes over time.

[ BibTex ] | [ PDF ] | [ PS ]



Bounding the Partition Function using Holder's Inequality

Liu, Ihler

We describe an algorithm for approximate inference in graphical models based on H\"older's inequality that provides upper and lower bounds on common summation problems such as computing the partition function or probability of evidence in a graphical model. Our algorithm unifies and extends several existing approaches, including variable elimination techniques such as mini-bucket elimination and variational methods such tree reweighted belief propagation and conditional entropy decomposition. We show that our method inherits benefits from each approach to provide significantly better bounds on sum-product tasks.

[ BibTex ] | [ PDF ]



Fast parallel and adaptive updates for dual-decomposition solvers

Sumer, Acar, Ihler, Mettu

Dual-decomposition (DD) methods are quickly becoming important tools for estimating the minimum energy state of a graphical model. DD methods decompose a complex model into a collection of simpler subproblems that can be solved exactly (such as trees), that in combination provide upper and lower bounds on the exact solution. Subproblem choice can play a major role: larger subproblems tend to improve the bound more per iteration, while smaller subproblems enable highly parallel solvers and can benefit from re-using past solutions when there are few changes between iterations. We propose an algorithm that can balance many of these aspects to speed up convergence. Our method uses a cluster tree data structure that has been proposed for adaptive exact inference tasks, and we apply it in this paper to dual-decomposition approximate inference. This approach allows us to process large subproblems to improve the bounds at each iteration, while allowing a high degree of parallelizability and taking advantage of subproblems with sparse updates. For both synthetic inputs and a real-world stereo matching problem, we demonstrate that our algorithm is able to achieve significant improvement in convergence time.

[ BibTex ] | [ PDF ]



Tightening MRF relaxations with planar subproblems

Yarkony, Morshed, Ihler, Fowlkes

We describe a new technique for computing lower-bounds on the minimum energy configuration of a planar Markov Random Field (MRF). Our method successively adds large numbers of constraints and enforces consistency over binary projections of the original problem state space. These constraints are represented in terms of subproblems in a dual-decomposition framework that is optimized using subgradient techniques. The complete set of constraints we consider enforces cycle consistency over the original graph. In practice we find that the method converges quickly on most problems with the addition of a few subproblems and outperforms existing methods for some interesting classes of hard potentials.

[ BibTex ] | [ PDF ]



Planar cycle covering graphs

Yarkony, Ihler, Fowlkes

We describe a new variational lower-bound on the minimum energy configuration of a planar binary Markov Random Field (MRF). Our method is based on adding auxiliary nodes to every face of a planar embedding of the graph in order to capture the effect of unary potentials. A ground state of the resulting approximation can be computed efficiently by reduction to minimum-weight perfect matching. We show that optimization of variational parameters achieves the same lower-bound as dual-decomposition into the set of all cycles of the original graph. We demonstrate that our variational optimization converges quickly and provides high-quality solutions to hard combinatorial problems 10-100x faster than competing algorithms that optimize the same bound.

[ BibTex ] | [ PDF ]



Variational algorithms for marginal MAP

Liu, Ihler

Marginal MAP problems are notoriously difficult tasks for graphical models. We derive a general variational framework for solving marginal MAP problems, in which we apply analogues of the Bethe, tree-reweighted, and mean field approximations. We then derive a ``mixed" message passing algorithm and a convergent alternative using CCCP to solve the BP-type approximations. Theoretically, we give conditions under which the decoded solution is a global or local optimum, and obtain novel upper bounds on solutions. Experimentally we demonstrate that our algorithms outperform related approaches. We also show that EM and variational EM comprise a special case of our framework.

[ BibTex ] | [ PDF ]



Fault Detection via Nonparametric Belief Propagation

Bickson, Baron, Ihler, Avissar, Dolev

We consider the problem of identifying a pattern of faults from a set of noisy linear measurements. Unfortunately, maximum a posteriori probability estimation of the fault pattern is computationally intractable. To solve the fault identification problem, we propose a non-parametric belief propagation approach. We show empirically that our belief propagation solver is more accurate than recent state-of-the-art algorithms including interior point methods and semidefinite programming. Our superior performance is explained by the fact that we take into account both the binary nature of the individual faults and the sparsity of the fault pattern arising from their rarity.

[ BibTex ] | [ Link ]



Revisiting MAP Estimation, Message Passing and Perfect Graphs

Foulds, Navaroli, Smyth, Ihler

Given a graphical model, one of the most useful queries is to find the most likely configuration of its variables. This task, known as the maximum a posteriori (MAP) problem, can be solved efficiently via message passing techniques when the graph is a tree, but is NP-hard for general graphs. Jebara (2009) shows that the MAP problem can be converted into the stable set problem, which can be solved in polynomial time for a broad class of graphs known as perfect graphs via a linear programming relaxation technique. This is a result of great theoretical interest. However, the article additionally claims that max-product linear programming (MPLP) message passing techniques of Globerson & Jaakkola (2007) are also guaranteed to solve these problems exactly and efficiently. We investigate this claim, show that it does not hold in general, and attempt to repair it with several alternative message passing algorithms.

[ BibTex ] | [ PDF ]



Multicore Gibbs Sampling in Dense, Unstructured Graphs

Xu, Ihler

Multicore computing is on the rise, but algorithms such as Gibbs sampling are fundamentally sequential and may require close consideration to be made parallel. Existing techniques either exploit sparse problem structure or make approximations to the algorithm; in this work, we explore an alternative to these ideas. We develop a parallel Gibbs sampling algorithm for shared-memory systems that does not require any independence structure among the variables yet does not approximate the sampling distributions. Our method uses a look-ahead sampler, which uses bounds to attempt to sample variables before the results of other threads are made available. We demonstrate our algorithm on Gibbs sampling in Boltzmann machines and latent Dirichlet allocation (LDA). We show in experiments that our algorithm achieves near linear speed-up in the number of cores, is faster than existing exact samplers, and is nearly as fast as approximate samplers while maintaining the correct stationary distribution.

[ BibTex ] | [ PDF ]



Learning Scale Free Networks by Reweighted L1 regularization

Liu, Ihler

Methods for L1-type regularization have been widely used in Gaussian graphical model selection tasks to encourage sparse structures. However, often we would like to include more structural information than mere sparsity. In this work, we focus on learning so-called ``scale-free'' models, a common feature that appears in many real-work networks. We replace the L1 regularization with a power law regularization and optimize the objective function by a sequence of iteratively reweighted L1 regularization problems, where the regularization coefficients of nodes with high degree are reduced, encouraging the appearance of hubs with high degree. Our method can be easily adapted to improve any existing L1-based methods, such as graphical lasso, neighborhood selection, and JSRM when the underlying networks are believed to be scale free or have dominating hubs. We demonstrate in simulation that our method significantly outperforms the a baseline L1 method at learning scale-free networks and hub networks, and also illustrate its behavior on gene expression data.

[ BibTex ] | [ PDF ]



Understanding Errors in Approximate Distributed Latent Dirichlet Allocation

Ihler, Newman

Latent Dirichlet allocation (LDA) is a popular algorithm for discovering semantic structure in large collections of text or other data. Although its complexity is linear in the data size, its use on increasingly massive collections has created considerable interest in parallel implementations. ``Approximate distributed'' LDA, or AD-LDA, approximates the popular collapsed Gibbs sampling algorithm for LDA models while running on a distributed architecture. Although this algorithm often appears to perform well in practice, its quality is not well understood theoretically or easily assessed on new data. In this work, we theoretically justify the approximation, and modify AD-LDA to track an error bound on performance. Specifically, we upper-bound the probability of making a sampling error at each step of the algorithm (compared to an exact, sequential Gibbs sampler), given the samples drawn thus far. We show empirically that our bound is sufficiently tight to give a meaningful and intuitive measure of approximation error in AD-LDA, allowing the user to track the trade-off between accuracy and efficiency while executing in parallel.

[ BibTex ] | [ PDF ] | [ Link ]



Nonparametric Belief Propagation

Sudderth, Ihler, Isard, Freeman, Willsky

Continuous quantities are ubiquitous in models of real-world phenomena, but are surprisingly difficult to reason about automatically. Probabilistic graphical models such as Bayesian networks and Markov random fields, and algorithms for approximate inference such as belief propagation, have proven to be powerful tools in a wide range of applications in statistics and artificial intelligence. However, applying these methods to models with continuous variables remains a challenging task. In this work we describe an extension of belief propagation to continuous variable models, generalizing particle filtering and Gaussian mixture filtering techniques for time series to more complex models. We illustrate the power of the resulting nonparametric belief propagation algorithm via two applications: kinematic tracking of visual motion and distributed localization in sensor networks.

[ BibTex ] | [ Link ]



Negative Tree-reweighted Belief Propagation

Liu, Ihler

We introduce a new class of lower bounds on the log partition function of a Markov random field which makes use of a reversed Jensen's inequality. In particular, our method approximates the intractable distribution using a linear combination of spanning trees with negative weights. This technique is a lower-bound counterpart to the tree-reweighted belief propagation algorithm, which uses a convex combination of spanning trees with positive weights to provide corresponding upper bounds. We develop algorithms to optimize and tighten the lower bounds over the non-convex set of valid parameter values. Our algorithm generalizes mean field approaches (including na\"ive and structured mean field approximations), which it includes as a limiting case.

[ BibTex ] | [ PDF ]



Covering Trees and Lower Bounds on Quadratic Assignment

Yarkony, Fowlkes, Ihler

Many computer vision problems involving feature correspondence among images can be formulated as an assignment problem with a quadratic cost function. Such problems are computationally infeasible in general but recent advances in discrete optimization such as tree-reweighted belief propagation (TRW) often provide high-quality solutions. In this paper, we improve upon these algorithms in two ways. First, we introduce covering trees, a variant of TRW which provide the same bounds on the MAP energy as TRW with far fewer variational parameters. Optimization of these parameters can be carried out efficiently using either fixed--point iterations (as in TRW) or sub-gradient based techniques. Second, we introduce a new technique that utilizes bipartite matching applied to the min-marginals produced with covering trees in order to compute a tighter lower-bound for the quadratic assignment problem. We apply this machinery to the problem of finding correspondences with pairwise energy functions, and demonstrate the resulting hybrid method outperforms TRW alone and a recent related subproblem decomposition algorithm on benchmark image correspondence problems.

[ BibTex ] | [ PDF ]



Particle Filtered MCMC-MLE with Connections to Contrastive Divergence

Asuncion, Liu, Ihler, Smyth

Learning undirected graphical models such as Markov random fields is an important machine learning task with applications in many domains. Since it is usually intractable to learn these models exactly, various approximate learning techniques have been developed, such as contrastive divergence (CD) and Markov chain Monte Carlo maximum likelihood estimation (MCMC-MLE). In this paper, we introduce particle filtered MCMC-MLE, which is a sampling-importance-resampling version of MCMC-MLE with additional MCMC rejuvenation steps. We also describe a unified view of (1) MCMC-MLE, (2) our particle filtering approach, and (3) a stochastic approximation procedure known as persistent contrastive divergence. We show how these approaches are related to each other and discuss the relative merits of each approach. Empirical results on various undirected models demonstrate that the particle filtering technique we propose in this paper can significantly outperform MCMC-MLE. Furthermore, in certain cases, the proposed technique is faster than persistent CD.

[ BibTex ] | [ PDF ]



Learning with Blocks: Composite Likelihood and Contrastive Divergence

Asuncion, Liu, Ihler, Smyth

Composite likelihood methods provide a wide spectrum of computationally efficient techniques for statistical tasks such as parameter estimation and model selection. In this paper, we present a formal connection between the optimization of composite likelihoods and the well-known contrastive divergence algorithm. In particular, we show that composite likelihoods can be stochastically optimized by performing a variant of contrastive divergence with random-scan blocked Gibbs sampling. By using higher-order composite likelihoods, our proposed learning framework makes it possible to trade off computation time for increased accuracy. Furthermore, one can choose composite likelihood blocks that match the model's dependence structure, making the optimization of higher-order composite likelihoods computationally efficient. We empirically analyze the performance of blocked contrastive divergence on various models, including visible Boltzmann machines, conditional random fields, and exponential random graph models, and we demonstrate that using higher-order blocks improves both the accuracy of parameter estimates and the rate of convergence.

[ BibTex ] | [ PDF ]



Estimating Replicate Time-Shifts Using Gaussian Process Regression

Liu, Lin, Anderson, Smyth, Ihler

Motivation: Time-course gene expression datasets provide important insights into dynamic aspects of biological processes, such as circadian rhythms, cell cycle and organ development. In a typical microarray time-course experiment, measurements are obtained at each time point from multiple replicate samples. Accurately recovering the gene expression patterns from experimental observations is made challenging by both measurement noise and variation among replicates' rates of development. Prior work on this topic has focused on inference of expression patterns assuming that the replicate times are synchronized. We develop a statistical approach that simultaneously infers both (i) the underlying (hidden) expression profile for each gene, as well as (ii) the biological time for each individual replicate. Our approach is based on Gaussian process regression (GPR) combined with a probabilistic model that accounts for uncertainty about the biological development time of each replicate.
Results: We apply GPR with uncertain measurement times to a microarray dataset of mRNA expression for the hair-growth cycle in mouse back skin, predicting both profile shapes and biological times for each replicate. The predicted time shifts show high consistency with independently obtained morphological estimates of relative development. We also show that the method systematically reduces prediction error on out-of-sample data, significantly reducing the mean squared error in a cross-validation study.
Availability: Matlab code for GPR with uncertain time shifts is available at http://sli.ics.uci.edu/Code/GPRTimeshift/
Contact: ihler@ics.uci.edu

[ BibTex ] | [ Link ]



Particle-Based Variational Inference for Continuous Systems

Ihler, Frank, Smyth

Since the development of loopy belief propagation, there has been considerable work on advancing the state of the art for approximate inference over distributions defined on discrete random variables. Improvements include guarantees of convergence, approximations that are provably more accurate, and bounds on the results of exact inference. However, extending these methods to continuous-valued systems has lagged behind. While several methods have been developed to use belief propagation on systems with continuous values, recent advances for discrete variables have not as yet been incorporated. In this context we extend a recently proposed particle-based belief propagation algorithm to provide a general framework for adapting discrete message-passing algorithms to inference in continuous systems. The resulting algorithms behave similarly to their purely discrete counterparts, extending the benefits of these more advanced inference techniques to the continuous domain.

[ BibTex ] | [ PDF ]



Bayesian detection of non-sinusoidal periodic patterns in circadian expression data

Chudova, Ihler, Lin, Andersen, Smyth

Motivation: Cyclical biological processes such as cell division and circadian regulation produce coordinated periodic expression of thousands of genes. Identification of such genes and their expression patterns is a crucial step in discovering underlying regulatory mechanisms. Existing computational methods are biased toward discovering genes that follow sine-wave patterns.
Results: We present an analysis of variance (ANOVA) periodicity detector and its Bayesian extension that can be used to discover periodic transcripts of arbitrary shapes from replicated gene expression profiles. The models are applicable when the profiles are collected at comparable time points for at least two cycles. We provide an empirical Bayes procedure for estimating parameters of the prior distributions and derive closed-form expressions for the posterior probability of periodicity, enabling efficient computation. The model is applied to two datasets profiling circadian regulation in murine liver and skeletal muscle, revealing a substantial number of previously undetected non-sinusoidal periodic transcripts in each. We also apply quantitative real-time PCR to several highly ranked non-sinusoidal transcripts in liver tissue found by the model, providing independent evidence of circadian regulation of these genes.
Availability: MATLAB software for estimating prior distributions and performing inference is available for download from http://www.datalab.uci.edu/resources/periodicity/.
Contact: dchudova@gmail.com

[ BibTex ] | [ Link ]



Bounding Sample Errors in Approximate Distributed Latent Dirichlet Allocation

Ihler, Newman

Latent Dirichlet allocation (LDA) is a popular algorithm for discovering structure in large collections of text or other data. Although its complexity is linear in the data size, its use on increasingly massive collections has created considerable interest in parallel implementations. ``Approximate distributed'' LDA, or AD-LDA, approximates the popular collapsed Gibbs sampling algorithm for LDA models while running on a distributed architecture. Although this algorithm often appears to perform well in practice, its quality is not well understood or easily assessed. In this work, we provide some theoretical justification of the algorithm, and modify AD-LDA to track an error bound on its performance. Specifically, we upper-bound the probability of making a sampling error at each step of the algorithm (compared to an exact, sequential Gibbs sampler), given the samples drawn thus far. We show empirically that our bound is sufficiently tight to give a meaningful and intuitive measure of approximation error in AD-LDA, allowing the user to understand the trade-off between accuracy and efficiency.

[ BibTex ] | [ PDF ]



Adaptive Updates for MAP Configurations with Applications to Bioinformatics

Acar, Ihler, Mettu, Sumer

Many applications involve repeatedly computing the optimal, maximum a posteriori (MAP) configuration of a graphical model as the model changes, often slowly or incrementally over time, e.g., due to input from a user. Small changes to the model often require updating only a small fraction of the MAP configuration, suggesting the possibility of performing updates faster than recomputing from scratch. In this paper we present an algorithm for efficiently performing such updates under arbitrary changes to the model. Our algorithm is within a logarithmic factor of the optimal and is asymptotically never slower than re-computing from-scratch: if a modification to the model requires $m$ updates to the MAP configuration of $n$ random variables, then our algorithm requires $O(m\log{(n/m)})$ time; re-computing from scratch requires $O(n)$ time. We evaluate the practical effectiveness of our algorithm by considering two problems in genomic signal processing, CpG region segmentation and protein sidechain packing, where a MAP configuration must be repeatedly updated. Our results show significant speedups over recomputing from scratch.

[ BibTex ] | [ PDF ]



A Low Density Lattice Decoder via Non-parametric Belief Propagation

Bickson, Ihler, Avissar, Dolev

The recent work of Sommer, Feder and Shalvi presented a new family of codes called low density lattice codes (LDLC) that can be decoded efficiently and approach the capacity of the AWGN channel. A linear time iterative decoding scheme which is based on a message-passing formulation on a factor graph is given.
In the current work we report our theoretical findings regarding the relation between the LDLC decoder and belief propagation. We show that the LDLC decoder is an instance of non-parametric belief propagation and further connect it to the Gaussian belief propagation algorithm. Our new results enable borrowing knowledge from the non-parametric and Gaussian belief propagation domains into the LDLC domain. Specifically, we give more general convergence conditions for convergence of the LDLC decoder (under the same assumptions of the original LDLC convergence analysis). We discuss how to extend the LDLC decoder from Latin square to full rank, non-square matrices. We propose an efficient construction of sparse generator matrix and its matching decoder. We report preliminary experimental results which show our decoder has comparable symbol to error rate compared to the original LDLC decoder.

[ BibTex ] | [ PDF ]



Circadian Clock Genes Contribute to the Regulation of Hair Follicle Cycling

Lin, Kumar, Geyfman, Chudova, Ihler, Smyth, Paus, Takahashi, Andersen

The hair follicle renews itself by repeatedly cycling among growth, regression, and rest phases. One function of hair follicle cycling is to allow seasonal changes in hair growth. Understanding the regulation of hair follicle cycling is also of interest because abnormal regulation of hair cycle control genes is responsible for several types of human hair growth disorders and skin cancers. We report here that Clock and Bmal1 genes, which control circadian rhythms, are also important for the regulation of hair follicle cycling, a biological process of much longer duration than 24 hours. Detailed analysis of skin from mice mutated for central clock genes indicates a significant delay in the progression of the hair growth phase. We show that clock genes affect the expression of key cell cycle control genes and that keratinocytes in a critical compartment of the hair follicles in Bmal1 mutant mice are halted in the G1 phase of the cell cycle. These findings provide novel insight into circadian control mechanisms in modulating the progression of cyclic biological processes on different time scales.

[ BibTex ] | [ PDF ] | [ Link ]



Particle Belief Propagation

Ihler, McAllester

The popularity of particle filtering for inference in Markov chain models defined over random variables with very large or continuous domains makes it natural to consider sample--based versions of belief propagation (BP) for more general (tree--structured or loopy) graphs. Already, several such algorithms have been proposed in the literature. However, many questions remain open about the behavior of particle--based BP algorithms, and little theory has been developed to analyze their performance. In this paper, we describe a generic particle belief propagation (PBP) algorithm which is closely related to previously proposed methods. We prove that this algorithm is consistent, approaching the true BP messages as the number of samples grows large. We then use concentration bounds to analyze the finite-sample behavior and give a convergence rate for the algorithm on tree--structured graphs. Our convergence rate is $O(1/\sqrt{n})$ where $n$ is the number of samples, independent of the domain size of the variables.

[ BibTex ] | [ PDF ]



Fast Collapsed Gibbs Sampling for Latent Dirichlet Allocation

Porteous, Newman, Ihler, Asuncion, Smyth, Welling

In this paper we introduce a novel collapsed Gibbs sampling method for the widely used latent Dirichlet allocation (LDA) model. Our new method results in significant speedups on real world text corpora. Conventional Gibbs sampling schemes for LDA require O(K) operations per sample where K is the number of topics in the model. Our proposed method draws equivalent samples but requires on average significantly less then K operations per sample. On real-word corpora FastLDA can be as much as 8 times faster than the standard collapsed Gibbs sampler for LDA. No approximations are necessary, and we show that our fast sampling scheme produces exactly the same results as the standard (but slower) sampling scheme. Experiments on four real world data sets demonstrate speedups for a wide range of collection sizes. For the PubMed collection of over 8 million documents with a required computation time of 6 CPU months for LDA, our speedup of 5.7 can save 5 CPU months of computation.

[ BibTex ] | [ PDF ]



Probabilistic Analysis of a Large Scale Urban Traffic Sensor Data Set

Hutchins, Ihler, Smyth

Real-world sensor time series are often significantly noisier and more difficult to work with than the relatively clean data sets that tend to be used as the basis for experiments in many research papers. In this paper we report on a large case-study involving statistical data mining of over 300 million measurements from 1700 freeway traffic sensors over a period of seven months in Southern California. We discuss the challenges posed by the wide variety of different sensor failures and anomalies present in the data. The volume and complexity of the data precludes the use of manual visualization or simple thresholding techniques to identify these anomalies. We describe the application of probabilistic modeling and unsupervised learning techniques to this data set and illustrate how these approaches can successfully detect underlying systematic patterns even in the presence of substantial noise and missing data

[ BibTex ] | [ PDF ] | [ Link ]



Adaptive Inference in General Graphical Models

Acar, Ihler, Mettu, Sumer

Many algorithms and applications involve repeatedly solving variations of the same inference problem; for example we may want to introduce new evidence to the model or perform updates to conditional dependencies. The goal of \emph{adaptive inference} is to take advantage of what is preserved in the model and perform inference more rapidly than from scratch. In this paper, we describe techniques for adaptive inference on general graphs that support marginal computation and updates to the conditional probabilities and dependencies in logarithmic time. We give experimental results for an implementation of our algorithm, and demonstrate its potential performance benefit in the study of protein structure.

[ BibTex ] | [ PDF ] | [ PS ]



Modeling Count Data from Multiple Sensors: A Building Occupancy Model

Hutchins, Ihler, Smyth

Knowledge of the number of people in a building at a given time is crucial for applications such as emergency response. Sensors can be used to gather noisy measurements which when combined, can be used to make inferences about the location, movement and density of people. In this paper we describe a probabilistic model for predicting the occupancy of a building using networks of people-counting sensors. This model provides robust predictions given typical sensor noise as well as missing and corrupted data from malfunctioning sensors. We experimentally validate the model by comparing it to a baseline method using real data from a network of optical counting sensors in a campus building.

[ BibTex ] | [ PDF ] | [ PS ]



Adaptive Bayesian Inference

Acar, Ihler, Mettu, Sumer

Motivated by stochastic systems in which observed evidence and conditional dependencies between states of the network change over time, and certain quantities of interest (marginal distributions, likelihood estimates etc.) must be updated, we study the problem of \emph{adaptive inference} in tree-structured Bayesian networks. We describe an algorithm for adaptive inference that handles a broad range of changes to the network and is able to maintain marginal distributions, MAP estimates, and data likelihoods in all expected logarithmic time. We give an implementation of our algorithm and provide experiments that show that the algorithm can yield up to two orders of magnitude speedups on answering queries and responding to dynamic changes over the sum-product algorithm.

[ BibTex ] | [ PDF ] | [ PS ]



Learning to detect events with Markov-modulated Poisson processes

Ihler, Hutchins, Smyth

Time-series of count data occur in many different contexts, including Internet navigation logs, freeway traffic monitoring, and security logs associated with buildings. In this article we describe a framework for detecting anomalous events in such data using an unsupervised learning approach. Normal periodic behavior is modeled via a time-varying Poisson process model, which in turn is modulated by a hidden Markov process that accounts for bursty events. We outline a Bayesian framework for learning the parameters of this model from count time-series. Two large real-world datasets of time-series counts are used as testbeds to validate the approach, consisting of freeway traffic data and logs of people entering and exiting a building. We show that the proposed model is significantly more accurate at detecting known events than a more traditional threshold-based technique. We also describe how the model can be used to investigate different degrees of periodicity in the data, including systematic day-of-week and time-of-day effects, and to make inferences about different aspects of events such as number of vehicles or people involved. The results indicate that the Markov-modulated Poisson framework provides a robust and accurate framework for adaptively and autonomously learning how to separate unusual bursty events from traces of normal human activity.

[ BibTex ] | [ Link ]



Accuracy Bounds for Belief Propagation

Ihler

The belief propagation algorithm is widely applied to perform approximate inference on arbitrary graphical models, in part due to its excellent empirical properties and performance. However, little is known theoretically about when this algorithm will perform well. Using recent analysis of convergence and stability properties in belief propagation and new results on approximations in binary systems, we derive a bound on the error in BP's estimates for pairwise Markov random fields over discrete--valued random variables. Our bound is relatively simple to compute, and compares favorably with a previous method for bounding the accuracy of belief propagation.

[ BibTex ] | [ PDF ] | [ PS ]



Graphical Models for Statistical Inference and Data Assimilation

Ihler, Kirshner, Ghil, Robertson, Smyth

In data assimilation for a system which evolves in time, one combines past and current observations with a model of the dynamics of the system, in order to improve the simulation of the system as well as any future predictions about it. From a statistical point of view, this process can be regarded as estimating many random variables which are related both spatially and temporally: given observations of some of these variables, typically corresponding to times past, we require estimates of several others, typically corresponding to future times.

Graphical models have emerged as an effective formalism for assisting in these types of inference tasks, particularly for large numbers of random variables. Graphical models provide a means of representing dependency structure among the variables, and can provide both intuition and efficiency in estimation and other inference computations. We provide an overview and introduction to graphical models, and describe how they can be used to represent statistical dependency and how the resulting structure can be used to organize computation. The relation between statistical inference using graphical models and optimal sequential estimation algorithms such as Kalman filtering is discussed. We then give several additional examples of how graphical models can be applied to climate dynamics, specifically estimation using multi-resolution models of large-scale data sets such as satellite imagery, and learning hidden Markov models to capture rainfall patterns in space and time.

[ BibTex ] | [ PDF ] | [ Link ]



Learning Time-Intensity Profiles of Human Activity Using Nonparametric Bayesian Models

Ihler, Smyth

Data sets that characterize human activity over time through collections of timestamped events or counts are of increasing interest in application areas as humancomputer interaction, video surveillance, and Web data analysis. We propose a non-parametric Bayesian framework for modeling collections of such data. In particular, we use a Dirichlet process framework for learning a set of intensity functions corresponding to different categories, which form a basis set for representing individual time-periods (e.g., several days) depending on which categories the time-periods are assigned to. This allows the model to learn in a data-driven fashion what “factors” are generating the observations on a particular day, including (for example) weekday versus weekend effects or day-specific effects corresponding to unique (single-day) occurrences of unusual behavior, sharing information where appropriate to obtain improved estimates of the behavior associated with each category. Applications to real–world data sets of count data involving both vehicles and people are used to illustrate the technique.

[ BibTex ] | [ PDF ] | [ PS ]



Adaptive Event Detection with Time-Varying Poisson Processes

Ihler, Hutchins, Smyth

Time-series of count data are generated in many different contexts, such as web access logging, freeway traffic monitoring, and security logs associated with buildings. Since this data measures the aggregated behavior of individual human beings, it typically exhibits a periodicity in time on a number of scales (daily, weekly,etc.) that reflects the rhythms of the underlying human activity and makes the data appear non-homogeneous. At the same time, the data is often corrupted by a number of bursty periods of unusual behavior such as building events, traffic accidents, and so forth. The data mining problem of finding and extracting these anomalous events is made difficult by both of these elements. In this paper we describe a framework for unsupervised learning in this context, based on a time-varying Poisson process model that can also account for anomalous events. We show how the parameters of this model can be learned from count time series using statistical estimation techniques. We demonstrate the utility of this model on two datasets for which we have partial ground truth in the form of known events, one from freeway traffic data and another from building access data, and show that the model performs significantly better than a non-probabilistic, threshold-based technique. We also describe how the model can be used to investigate different degrees of periodicity in the data, including systematic day-of-week and time-of-day effects, and make inferences about the detected events (e.g., popularity or level of attendance). Our experimental results indicate that the proposed time-varying Poisson model provides a robust and accurate framework for adaptively and autonomously learning how to separate unusual bursty events from traces of normal human activity.

[ BibTex ] | [ PDF ] | [ PS ]



Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation

Porteous, Ihler, Smyth, Welling

Nonparametric Bayesian approaches to clustering, information retrieval, language modeling and object recognition have recently shown great promise as a new paradigm for unsupervised data analysis. Most contributions have focused on the Dirichlet process mixture models or extensions thereof for which efficient Gibbs samplers exist. In this paper we explore Gibbs samplers for infinite complexity mixture models in the stick breaking representation. The advantage of this representation is improved modeling flexibility. For instance, one can design the prior distribution over cluster sizes or couple multiple infi- nite mixture models (e.g., over time) at the level of their parameters (i.e., the dependent Dirichlet process model). However, Gibbs samplers for in- finite mixture models (as recently introduced in the statistics literature) seem to mix poorly over cluster labels. Among others issues, this can have the adverse effect that labels for the same cluster in coupled mixture models are mixed up. We introduce additional moves in these samplers to improve mixing over cluster labels and to bring clusters into correspondence. An application to modeling of storm trajectories is used to illustrate these ideas.

[ BibTex ] | [ PDF ]



Distributed Fusion in Sensor Networks

Cetin, Chen, Fisher, Ihler, Moses, Wainwright, Willsky

Distributed inference methods developed for graphical models comprise a principled approach for data fusion in sensor networks. The application of these methods, however, requires some care due to a number of issues that are particular to sensor networks. Chief of among these are the distributed nature of computation and deployment coupled with communications bandwidth and energy constraints typical of many sensor networks. Additionally, information sharing in a sensor network necessarily involves approximation. Traditional measures of distortion are not sufficient to characterize the quality of approximation as they do not address in an explicit manner the resulting impact on inference which is at the core of many data fusion problems. While both graphical models and a distributed sensor network have network structures associated with them, the mapping is not one to one. All of these issues complicate the mapping of a particular inference problem to a given sensor network structure. Indeed, there may be a variety of mappings with very different characteristics with regard to computational complexity and utilization of resources. Nevertheless, it is the case that many of the powerful distributed inference methods have a role in information fusion for sensor networks. In this article we present an overview of research conducted by the authors that has sought to clarify many of the important issues at the intersection of these domains. We discuss both theoretical issues and prototypical applications in addition to suggesting new lines of reasoning.

[ BibTex ] | [ PDF ]



Particle Filtering Under Communications Constraints

Ihler, Fisher, Willsky

Particle filtering is often applied to the problem of object tracking under non-Gaussian uncertainty; however, sensor networks frequently require that the implementation be local to the region of interest, eventually forcing the large, sample-based representation to be moved among power-constrained sensors. We consider the problem of successive approximation (i.e., lossy compression) of each sample-based density estimate, in particular exploring the consequences (both theoretical and empirical) of several possible choices of loss function and their interpretation in terms of future errors in inference, justifying their use for measuring approximations in distributed particle filtering.

[ BibTex ] | [ PDF ] | [ PS ]



Estimating Dependency and Significance for High-Dimensional Data

Siracusa, Tieu, Ihler, Fisher, Willsky

Understanding the dependency structure of a set of variables is a key component in various signal processing applications which involve data association. The simple task of detecting whether any dependency exists is particularly difficult when models of the data are unknown or difficult to characterize because of high-dimensional measurements. We review the use of nonparametric tests for characterizing dependency and how to carry out these tests with highdimensional observations. In addition we present a method to assess the significance of the tests.

[ BibTex ] | [ PDF ]



Nonparametric Belief Propagation for Sensor Network Self-Calibration

Ihler, Fisher, Moses, Willsky

Automatic self-localization is a critical need for the effective use of ad-hoc sensor networks in military or civilian applications. In general, self-localization involves the combination of absolute location information (\eg GPS) with relative calibration information (\eg distance measurements between sensors) over regions of the network. Furthermore, it is generally desirable to distribute the computational burden across the network and minimize the amount of inter-sensor communication. We demonstrate that the information used for sensor localization is fundamentally local with regard to the network topology and use this observation to reformulate the problem within a graphical model framework. We then present and demonstrate the utility of \emph{nonparametric belief propagation} (NBP), a recent generalization of particle filtering, for both estimating sensor locations and representing location uncertainties. NBP has the advantage that it is easily implemented in a distributed fashion, admits a wide variety of statistical models, and can represent multi-modal uncertainty. Using simulations of small- to moderately-sized sensor networks, we show that NBP may be made robust to outlier measurement errors by a simple model augmentation, and that judicious message construction can result in better estimates. Furthermore, we provide an analysis of NBP's communications requirements, showing that typically only a few messages per sensor are required, and that even low bit-rate approximations of these messages can have little or no performance impact.

[ BibTex ] | [ PDF ] | [ PS ]



Loopy Belief Propagation: Convergence and Effects of Message Errors

Ihler, Fisher, Willsky

Belief propagation (BP) is an increasingly popular method of performing approximate inference on arbitrary graphical models. At times, even further approximations are required, whether due to quantization of the messages or model parameters, from other simplified message or model representations, or from stochastic approximation methods. The introduction of such errors into the BP message computations has the potential to affect the solution obtained adversely. We analyze the effect resulting from message approximation under two particular measures of error, and show bounds on the accumulation of errors in the system. This analysis leads to convergence conditions for traditional BP message passing, and both strict bounds and estimates of the resulting error in systems of approximate BP message passing.

[ BibTex ] | [ PDF ] | [ PS ]



Message Errors in Belief Propagation

Ihler, Fisher, Willsky

Belief propagation (BP) is an increasingly popular method of performing approximate inference on arbitrary graphical models. At times, even further approximations are required, whether from quantization or other simplified message representations or from stochastic approximation methods. Introducing such errors into the BP message computations has the potential to adversely affect the solution obtained. We analyze this effect with respect to a particular measure of message error, and show bounds on the accumulation of errors in the system. This leads both to convergence conditions and error bounds in traditional and approximate BP message passing.

[ BibTex ] | [ PDF ] | [ PS ]



Nonparametric Hypothesis Tests for Statistical Dependency

Ihler, Fisher, Willsky

Determining the structure of dependencies among a set of variables is a common task in many signal and image processing applications, including multi-target tracking and computer vision. In this paper we present an information-theoretic, machine learning approach to problems of this type. We cast this problem as a hypothesis test between factorizations of variables into mutually independent subsets. We show that the likelihood ratio can be written as sums of two sets of Kullback-Leibler (KL) divergence terms. The first set captures the structure of the statistical dependencies within each hypothesis, while the second set measures the details of model differences between hypotheses. We then consider the case when the signal prior models are unknown, so that the distributions of interest must be estimated directly from data, showing that the second set of terms is (asymptotically) negligible and quantifying the loss in hypothesis separability when the models are completely unknown. We demonstrate the utility of nonparametric estimation methods for such problems, providing a general framework for determining and distinguishing between dependency structures in highly uncertain environments. Additionally, we develop a machine learning approach for estimating lower bounds on KL-divergence and mutual information from samples of high-dimensional random variables for which direct density estimation is infeasible. We present empirical results in the context of three prototypical applications: association of signals generated by sources possessing harmonic behavior, scene correspondence using video imagery, and detection of coherent behavior among sets of moving objects.

[ BibTex ] | [ PDF ]



Communications-Constrained Inference

Ihler, Fisher, Willsky

In many applications, particularly power-constrained sensor networks, it is important to conserve the amount of data exchanged while maximizing the utility of that data for some inference task. Broadly, this tradeoff has two major cost components—the representation’s size (in distributed networks, the communications cost) and the error incurred by its use (the inference cost).

We analyze this tradeoff for a particular problem: communicating a particle-based representation (and more generally, a Gaussian mixture or kernel density estimate). We begin by characterizing the exact communication cost of these representations, noting that it is less than might be suggested by traditional communications theory due to the invariance of the represen- tation to reordering. We describe the optimal, lossless encoder when the generating distribution is known, and pose a sub-optimal encoder which still benefits from reordering invariance.

However, lossless encoding may not be sufficient. We describe one reasonable measure of error for distribution-based messages and its consequences for inference in an acyclic network, and propose a novel density approximation method based on KD-tree multiscale representations which enables the communications cost and a bound on error to be balanced efficiently. We show several empirical examples demonstrating the method’s utility in collaborative, distributed signal processing under bandwidth or power constraints.

[ BibTex ] | [ PDF ]



Nonparametric Belief Propagation for Sensor Network Self-Calibration

Ihler, Fisher, Moses, Willsky

Automatic self-calibration of ad-hoc sensor networks is a critical need for their use in military or civilian applications. In general, self-calibration involves the combination of absolute location information (e.g. GPS) with relative calibration information (e.g. estimated distance between sensors) over regions of the network. We formulate the self-calibration problem as a graphical model, enabling application of nonparametric belief propagation (NBP), a recent generalization of particle filtering, for both estimating sensor locations and representing location uncertainties. NBP has the advantage that it is easily implemented in a distributed fashion, can represent multi-modal uncertainty, and admits a wide variety of statistical models. This last point is particularly appealing in that it can be used to provide robustness against occasional high-variance (outlier) noise. We illustrate the performance of NBP using Monte Carlo analysis on an example network.

[ BibTex ] | [ PDF ] | [ PS ]



Nonparametric Belief Propagation for Self-Calibration in Sensor Networks

Ihler, Fisher, Moses, Willsky

Automatic self-calibration of ad-hoc sensor networks is a critical need for their use in military or civilian applications. In general, self-calibration involves the combination of absolute location information (e.g. GPS) with relative calibration information (e.g. time delay or received signal strength between sensors) over regions of the network. Furthermore, it is generally desirable to distribute the computational burden across the network and minimize the amount of inter-sensor communication. We demonstrate that the information used for sensor calibration is fundamentally local with regard to the network topology and use this observation to reformulate the problem within a graphical model framework. We then demonstrate the utility of \emph{nonparametric belief propagation} (NBP), a recent generalization of particle filtering, for both estimating sensor locations and representing location uncertainties. NBP has the advantage that it is easily implemented in a distributed fashion, admits a wide variety of statistical models, and can represent multi-modal uncertainty. We illustrate the performance of NBP on several example networks while comparing to a previously published nonlinear least squares method.

[ BibTex ] | [ PDF ] | [ PS ]



Efficient Multiscale Sampling from Products of Gaussian Mixtures

Ihler, Sudderth, Freeman, Willsky

The problem of approximating the product of several Gaussian mixture distributions arises in a number of contexts, including the nonparametric belief propagation (NBP) inference algorithm and the training of product of experts models. This paper develops two multiscale algorithms for sampling from a product of Gaussian mixtures, and compares their performance to existing methods. The first is a multiscale variant of previously proposed Monte Carlo techniques, with comparable theoretical guarantees but improved empirical convergence rates. The second makes use of approximate kernel density evaluation methods to construct a fast approximate sampler, which is guaranteed to sample points to within a tunable parameter $\epsilon$ of their true probability. We compare both multiscale samplers on a set of computational examples motivated by NBP, demonstrating significant improvements over existing methods.

[ BibTex ] | [ PDF ] | [ PS ]



Nonparametric Belief Propagation

Sudderth, Ihler, Freeman, Willsky

In many applications of graphical models arising in computer vision, the hidden variables of interest are most naturally specified by continuous, non-Gaussian distributions. There exist inference algorithms for discrete approximations to these continuous distributions, but for the high-dimensional variables typically of interest, discrete inference becomes infeasible. Stochastic methods such as particle filters provide an appealing alternative. However, existing techniques fail to exploit the rich structure of the graphical models describing many vision problems. Drawing on ideas from regularized particle filters and belief propagation (BP), this paper develops a nonparametric belief propagation (NBP) algorithm applicable to general graphs. Each NBP iteration uses an efficient sampling procedure to update kernel-based approximations to the true, continuous likelihoods. The algorithm can accomodate an extremely broad class of potential functions, including nonparametric representations. Thus, NBP extends particle filtering methods to the more general vision problems that graphical models can describe. We apply the NBP algorithm to infer component interrelationships in a parts-based face model, allowing location and reconstruction of occluded features.

[ BibTex ] | [ PDF ] | [ PS ]



Hypothesis Testing over Factorizations for Data Association

Ihler, Fisher, Willsky

The issue of data association arises frequently in sensor networks; whenever multiple sensors and sources are present, it may be necessary to determine which observations from different sensors correspond to the same target. In highly uncertain environments, one may need to determine this correspondence without the benefit of an \emph{a priori} known joint signal/sensor model. This paper examines the data association problem as the more general hypothesis test between factorizations of a single, learned distribution. The optimal test between known distributions may be decomposed into model-dependent and statistical dependence terms, quantifying the cost incurred by model estimation from measurements compared to a test between known models. We demonstrate how one might evaluate a two-signal association test efficiently using kernel density estimation methods to model a wide class of possible distributions, and show the resulting algorithm's ability to determine correspondence in uncertain conditions through a series of synthetic examples. We then describe an extension of this technique to multi-signal association which can be used to determine correspondence while avoiding the computationally prohibitive task of evaluating all hypotheses. Empirical results of the approximate approach are presented.

[ BibTex ] | [ PDF ] | [ PS ]



Nonparametric Belief Propagation

Sudderth, Ihler, Freeman, Willsky

In applications of graphical models arising in fields such as computer vision, the hidden variables of interest are most naturally specified by continuous, non-Gaussian distributions. However, due to the limitations of existing inference algorithms, it is often necessary to form coarse, discrete approximations to such models. In this paper, we develop a nonparametric belief propagation (NBP) algorithm, which uses stochastic methods to propagate kernel-based approximations to the true continuous messages. Each NBP message update is based on an efficient sampling procedure which can accomodate an extremely broad class of potential functions, allowing easy adaptation to new application areas. We validate our method using comparisons to continuous BP for Gaussian networks, and an application to the stereo vision problem.

[ BibTex ] | [ PDF ] | [ PS ]



Nonparametric Estimators for Online Signature Authentication

Ihler, Fisher, Willsky

We present extensions to our previous work in modelling dynamical processes. The approach uses an information theoretic criterion for searching over subspaces of the past observations, combined with a nonparametric density characterizing its relation to one-step-ahead prediction and uncertainty. We use this methodology to model handwriting stroke data, specifically signatures, as a dynamical system and show that it is possible to learn a model capturing their dynamics for use either in synthesizing realistic signatures and in discriminating between signatures and forgeries even though no forgeries have been used in constructing the model. This novel approach yields promising results even for small training sets.

[ BibTex ] | [ PDF ] | [ PS ]



Learning Informative Statistics: A Nonparametric Approach

Fisher, Ihler, Viola

We discuss an information theoretic approach for categorizing and modeling dynamic processes. The approach can learn a compact and informative statistic which summarizes past states to predict future observations. Furthermore, the uncertainty of the prediction is characterized nonparametrically by a joint density over the learned statistic and present observation. We discuss the application of the technique to both noise driven dynamical systems and random processes sampled from a density which is conditioned on the past. In the first case we show results in which both the dynamics of random walk and the statistics of the driving noise are captured. In the second case we present results in which a summarizing statistic is learned on noisy random telegraph waves with differing dependencies on past states. In both cases the algorithm yields a principled approach for discriminating processes with differing dynamics and/or dependencies. The method is grounded in ideas from information theory and nonparametric statistics.

[ BibTex ] | [ PDF ] | [ PS ]