CS 274A: Background Notes and Reading, Winter 2018
Note that the contents of this page may be updated before the quarter starts or during the quarter.
Notes (Note Sets 1 to 3 are particularly relevant for the 1st and 2nd week)
Textbooks (recommended for background reading but not required)
 Deep Learning, by Goodfellow, Bengio, and Courville, Bengio, MIT Press, 2016.
Even though this text is mostly about deep learning (Sections II and III, and beyond the scope of our class),
Section I is about probabilistic
learning in general and provides a lot of useful background material for this class.

Machine Learning: A Probabilistic Perspective,
by Kevin Murphy, MIT Press, 2012. The current standard reference text for probabilistic machine learning. Covers
far more than we will cover in this 10week class.
If you plan to use machine learning in your research after this class you may want to buy
a copy of this text  you will find it to be a very useful reference in your research.
 Bayesian Reasoning and Machine Learning,
by David Barber, Cambridge University Press. Another useful reference text on probabilistic learning (the PDF version is free).
 Probabilistic machine learning and artificial intelligence, Zoubin Ghahramani, Nature, 2015. Good overview article of the role of probability in modern machine learning and AI.
 Modelbased machine learning, Chris Bishop,
Phil Trans R. Soc. A, 2012.
A wellwritten overview article that reviews some of the key ideas behind probabilistic modelbased learning
General Background/Review Material on Probability :
 Topics: random variables, conditional and joint
probabilities, Bayes rule, law of total probability, chain rule
and factorization. Frequentist and Bayesian views of probability. Sets of random variables, the
multivariate Gaussian model. Conditional independence and graphical
models.
 Note Sets 1 and 2 above
 Goodfellow et al text: Chapter 3, Probability and
Information Theory (well worth reading before doing Homework 1)
 Barber text: Chapter 1.1, 1.2 (basic probability), Chapter 2 (graphs), Chapter 3.1 to 3.3 (directed
graphical models, referred to here as 'belief networks'), sections 8.1 to 8.4 (univariate and
multivariate distributions)
 Murphy text:
Chapter 1 (introduction), Chapter 2.1 through 2.5 (probability and distributions), Chapter
10.1 to 10.3 (graphical models)
 See also class notes from Kevin
Murphy on on directed graphical
models, Markov models, and multivariate Gaussians
 Excellent 15 minute video on multivariate
Gaussian distributions from our own Alex Ihler
 Chapter from Chris Bishop's book on graphical models (the material on graphical models starts about 20 pages into the document)
Learning from Data using Maximum Likelihood
 Topics: Concepts of models and
parameters. Definition of the likelihood function and the principle of maximum likelihood
parameter estimation. Using maximum likelihood methods to learn the parameters of Gaussian models, binomial,
multivariate and other parametric models.
 Note Set 3 above
 Barber text: pages 174177
 Tutorial paper on maximum likelihood estimation
Bayesian Learning
 Topics: General principles of Bayesian estimation: prior densities, posterior densities, MAP,
fully Bayesian approaches. Beta/binomial and Gaussian examples. Predictive densities, model selection, model averaging .
 Note Sets 1 and 2 above
 Notes on analysis of binomial and multinomial models from Kevin Murphy
 Barber text: pages 191194 (in Chapter 9, Learning as Inference), Pages 177179 in Chapter 8, and Chapter 12 on Bayesian Model Selection
 Murphy text: Chapter 3.1 to 3.4 and Chapter 5.1, 5.2, 5.3
 An introductory chapter on the
principles of Bayesian inference by the late David Mackay, from
his excellent book
Information Theory, Inference, and Learning Algorithms. Also a link to a video of David
lecturing on An Introduction to Bayesian
Inference.
Optimization Methods for Machine Learning
Regression Models
Probabilistic Classification
 Topics: Bayes rule, classification boundaries, discriminant functions, Optimal decisions,
Bayes error rate, Gaussian classifiers. Likelihoodbased approaches and properties of
objective functions. Logistic regression and neural network models.
 Barber text: pages 229234 (in Chapter 10 on Naive Bayes),
pages 353358 on logistic regression (in Chapter 17 on Linear Models)
 Murphy text: pages 101107 on Gaussian classifiers in Chapter 4, Chapter 8.1. 8.2, 8.3
 Notes on logistic regression
from Charles Elkan
 Paper on
Logistic regression for highdimensional text data
The EM Algorithm, Mixture Models, and Probabilistic Clustering
StateSpace and TimeSeries Models
 Topics: discrete and continuous latentstate space models. Hidden Markov models,
Kalman filters. Basic principles of smoothing and filtering. Parameter estimation methods using EM.

Tutorial paper on latentvariable models for timeseries data, Barber and Cemgil,
IEEE Signal Processing Magazine, 2010.
 Barber text: pages 451471 (in Chapter 23 on Dynamical Models)
 Murphy text: Chapter 17.1 to 17.5
 Sequential modeling using
recurrent neural networks from the Goodfellow et al. text
Sampling Methods