CS 274A: Background Notes and Reading, Winter 2017
Notes (Note Sets 1 to 3 are particularly relevant for the 1st and 2nd week)
Textbooks (recommended for background reading but not required)
General Background/Review Material on Probability
- Deep Learning, by Goodfellow, Bengio, and Courville, Bengio, MIT Press, 2016.
Even though this text is mostly about deep learning (Sections II and III, beyond the scope of our class),
Section I is about probabilistic
learning in general and provides a lot of very useful background material for this class.
Machine Learning: A Probabilistic Perspective,
by Kevin Murphy, MIT Press, 2012. The current standard reference text for probabilistic machine learning. Covers
far more than we will cover in this 10-week class.
If you plan to use machine learning in your research after this class you may want to go ahead buy
a copy of this text - you will find it to be a very useful reference in your research.
- Bayesian Reasoning and Machine Learning,
by David Barber, Cambridge University Press. Another useful reference text on probabilistic learning (the PDF version is free).
- Model-based machine learning, Chris Bishop,
Phil Trans R. Soc. A, 2012.
A well-written overview article that reviews some of the key ideas behind probabilistic model-based learning
Learning from Data using Maximum Likelihood
- Topics: random variables, conditional and joint
probabilities, Bayes rule, law of total probability, chain rule
and factorization. Frequentist and Bayesian views of probability. Sets of random variables, the
multivariate Gaussian model. Conditional independence and graphical
- Note Sets 1 and 2 above
- Goodfellow et al text: Chapter 3, Probability and
Information Theory (well worth reading before doing Homework 1)
- Barber text: Chapter 1.1, 1.2 (basic probability), Chapter 2 (graphs), Chapter 3.1 to 3.3 (directed
graphical models, referred to here as 'belief networks'), sections 8.1 to 8.4 (univariate and
- Murphy text:
Chapter 1 (introduction), Chapter 2.1 through 2.5 (probability and distributions), Chapter
10.1 to 10.3 (graphical models)
- See also class notes from Kevin
Murphy on on directed graphical
models, Markov models, and multivariate Gaussians
- Excellent 15 minute video on multivariate
Gaussian distributions from our own Alex Ihler
- Chapter from Chris Bishop's book on graphical models
- Topics: Concepts of models and
parameters. Definition of the likelihood function and the principle of maximum likelihood
parameter estimation. Using maximum likelihood methods to learn the parameters of Gaussian models, binomial,
multivariate and other parametric models.
- Note Set 3 above
- Barber text: pages 174-177
- Tutorial paper on maximum likelihood estimation
Optimization Methods for Machine Learning
- Topics: General principles of Bayesian estimation: prior densities, posterior densities, MAP,
fully Bayesian approaches. Beta/binomial and Gaussian examples. Predictive densities, model selection, model averaging .
- Note Sets 1 and 2 above
- Notes on analysis of binomial and multinomial models from Kevin Murphy
- Barber text: pages 191-194 (in Chapter 9, Learning as Inference), Pages 177-179 in Chapter 8, and Chapter 12 on Bayesian Model Selection
- Murphy text: Chapter 3.1 to 3.4 and Chapter 5.1, 5.2, 5.3
- An introductory chapter on the
principles of Bayesian inference by the late David Mackay, from
his excellent book
Information Theory, Inference, and Learning Algorithms. Also a link to a video of David
lecturing on An Introduction to Bayesian
The EM Algorithm, Mixture Models, and Probabilistic Clustering
- Topics: Bayes rule, classification boundaries, discriminant functions, Optimal decisions,
Bayes error rate, Gaussian classifiers. Likelihood-based approaches and properties of
objective functions. Logistic regression and neural network models.
- Barber text: pages 229-234 (in Chapter 10 on Naive Bayes),
pages 353-358 on logistic regression (in Chapter 17 on Linear Models)
- Murphy text: pages 101-107 on Gaussian classifiers in Chapter 4, Chapter 8.1. 8.2, 8.3
- Notes on logistic regression
from Charles Elkan
- Paper on
Logistic regression for high-dimensional text data
State-Space and Time-Series Models
- Topics: Mixtures of Gaussians and the associated EM algorithm.
K-means clustering. Mixtures of conditional indepedence models.
Applications to text data. Underlying theory of the EM algorithm.
- Note Set 4 above (EM for Gaussian mixture models)
- General derivation of the EM Algorithm: pages 404-406 in Barber, pages 363-369 in Murphy
- Barber text: pages 403-416
- Murphy text: pages 337-356 (Chapter 11)
- Jeff Bilmes
tutorial notes on EM
- Frank Dellaert's tutorial notes on EM
- Liang and Klein's Online EM
with applications to text
- Fraley and Raftery paper on model-based clustering
- Topics: discrete and continuous latent-state space models. Hidden Markov models,
Kalman filters. Basic principles of smoothing and filtering. Parameter estimation methods using EM.
- Barber text: pages 451-471 (in Chapter 23 on Dynamical Models)
- Murphy text: Chapter 17.1 to 17.5
- Sequential modeling using
recurrent neural networks from the Goodfellow et al. text