School of Information and Computer Science
Department of Statistics

Machine Learning/Statistics Journal Club --- Winter 2005
General Information
Time  
11am-12pm
Place  
ICS 432 / IERF 127 (see announcement)
Goal   The purpose of this journal club is to have an informal meeting to read and discuss some papers of importance or great potential for research. The main subject will be on the interface of machine learning and statistics. The format will be of informal talks leaded by a discussant. The discussant has his/her own call regarding the relevance of the papers and topics, and in general, a wide spectrum of subject of matters are welcome.

Our intention is to do some small things in a hope to create an interactive environment for all participants. It is NOT our intention to have many tutorial sessions, but discussions are highly appreciated, so feel free to ask any question.

Want to have some fun or contribute a session, please send email to Max (welling@uci) or Gang (liang@uci).

Schedules (tentative)
Jan 21   Hierarchical Dirichlet Process, (Gang Liang), ICS 432

Jan 28   Max Welling

A very simply idea on "How to Turn Stochastic Gradient Descent into Langevin Sampling from the Posterior Distribution". This is *very* preliminary work and I'd simply like some feedback (read: shoot holes in the idea). For those who are using gradient descent on the log-likelihood (e.g. by backpropagation through a neural network), here is a change to "upgrade" you algorithm for free into a Baysian sampler.

ABSTRACT: For very large redundant data-sets, stochastic gradient descend is often the only route to learning the parameters of a model. Unfortunately, since we never compute the true gradient, we jump around the true maximum of the log-likelihood. The scale of this noise is determined by the stepsize, and in principle we would need to anneal the stepsize to zero to converge to the true maximum. In Bayesian techniques on the other hand we often artificially generate noise to assist us in sampling the posterior distribution. A nice example is the Langevin equation where Gaussian noise of a certain variance is added to the gradient. In this paper we propose to combine these two objectives: use the artificial noise generated by stochastic gradient descent to approximately sample from the posterior. The algorithm is embarrassingly simple: use Langevin, but in adding Gaussian noise, subtract the noise generated by estimating the derivative of the log-likelihood using a single data-case.

Feb 4   AISTATS Max Welling, ICS 432

A brief overview of some of the more interesting contributions to AISTATS.


Feb 18   David van Dyk, IERF 127
Mar 4   Learning models for graph and network data, (Padhraic Smyth)
Mar 18   Pierre Baldi, TBA.
Interesting Papers
Leo Breiman, Machine learning, Wald Lecture I, 2002
Leo Breiman, Looking inside the blackbox, Wald Lecture II, 2002
Leo Breiman, Software for masses, Wald Lecture III, 2002

Please let Max or Gang know if there is any paper you think is of great fun to read.