Time
|
11am-12pm |
Place
|
ICS 432 / IERF 127 (see announcement)
|
| Goal |
The purpose of this journal club is to have an informal
meeting to read and discuss some papers of importance
or great potential for research. The main subject will
be on the interface of machine learning and
statistics. The format will be of informal talks
leaded by a discussant. The discussant has his/her own
call regarding the relevance of the papers and topics,
and in general, a wide spectrum of subject of matters
are welcome.
Our intention is to do some small things in a hope to
create an interactive environment for all participants.
It is NOT our intention to have many tutorial
sessions, but discussions are highly appreciated, so
feel free to ask any question.
Want to have some fun or contribute a session, please
send email to Max (welling@uci) or Gang (liang@uci).
|
|
| Jan 21 |
Hierarchical Dirichlet Process, (Gang Liang),
ICS 432
|
| Jan 28 |
Max Welling
A very simply idea on "How to Turn Stochastic Gradient
Descent into Langevin Sampling from the Posterior
Distribution". This is *very* preliminary work and I'd
simply like some feedback (read: shoot holes in the
idea). For those who are using gradient descent on the
log-likelihood (e.g. by backpropagation through a
neural network), here is a change to "upgrade" you
algorithm for free into a Baysian sampler.
ABSTRACT: For very large redundant data-sets,
stochastic gradient descend is often the only route to
learning the parameters of a model. Unfortunately,
since we never compute the true gradient, we jump
around the true maximum of the log-likelihood. The
scale of this noise is determined by the stepsize, and
in principle we would need to anneal the stepsize to
zero to converge to the true maximum. In Bayesian
techniques on the other hand we often artificially
generate noise to assist us in sampling the posterior
distribution. A nice example is the Langevin equation
where Gaussian noise of a certain variance is added to
the gradient. In this paper we propose to combine these
two objectives: use the artificial noise generated by
stochastic gradient descent to approximately sample
from the posterior. The algorithm is embarrassingly
simple: use Langevin, but in adding Gaussian noise,
subtract the noise generated by estimating the
derivative of the log-likelihood using a single
data-case.
|
| Feb 4 |
AISTATS Max Welling, ICS 432
A brief overview of some of the more interesting
contributions to AISTATS.
|
| Feb 18 |
David van Dyk, IERF 127
|
| Mar 4 |
Learning models for graph and network data,
(Padhraic Smyth)
- Hoff, P., Raftery, A.E. and Handcock, M.S. (2002). Latent Space Approaches to
Social Network Analysis, Journal of the American
Statistical Association, 97, 1090-1098.
- J.-P. Vert and Y. Yamanishi, Supervised graph
inference, Advances in Neural Information
Processing Systems 17 (NIPS 2004), 2005 (to appear).
|
| Mar 18 |
Pierre Baldi, TBA.
|
|
|