CS 274A: Background Notes and Reading, Winter 2018
Note that the contents of this page may be updated before the quarter starts or during the quarter.
Notes (Note Sets 1 to 3 are particularly relevant for the 1st and 2nd week)
Textbooks (recommended for background reading but not required)
General Background/Review Material on Probability
- Deep Learning, by Goodfellow, Bengio, and Courville, Bengio, MIT Press, 2016.
Even though this text is mostly about deep learning (Sections II and III, and beyond the scope of our class),
Section I is about probabilistic
learning in general and provides a lot of useful background material for this class.
Machine Learning: A Probabilistic Perspective,
by Kevin Murphy, MIT Press, 2012. The current standard reference text for probabilistic machine learning. Covers
far more than we will cover in this 10-week class.
If you plan to use machine learning in your research after this class you may want to buy
a copy of this text - you will find it to be a very useful reference in your research.
- Bayesian Reasoning and Machine Learning,
by David Barber, Cambridge University Press. Another useful reference text on probabilistic learning (the PDF version is free).
- Probabilistic machine learning and artificial intelligence, Zoubin Ghahramani, Nature, 2015. Good overview article of the role of probability in modern machine learning and AI.
- Model-based machine learning, Chris Bishop,
Phil Trans R. Soc. A, 2012.
A well-written overview article that reviews some of the key ideas behind probabilistic model-based learning
Learning from Data using Maximum Likelihood
- Topics: random variables, conditional and joint
probabilities, Bayes rule, law of total probability, chain rule
and factorization. Frequentist and Bayesian views of probability. Sets of random variables, the
multivariate Gaussian model. Conditional independence and graphical
- Note Sets 1 and 2 above
- Goodfellow et al text: Chapter 3, Probability and
Information Theory (well worth reading before doing Homework 1)
- Barber text: Chapter 1.1, 1.2 (basic probability), Chapter 2 (graphs), Chapter 3.1 to 3.3 (directed
graphical models, referred to here as 'belief networks'), sections 8.1 to 8.4 (univariate and
- Murphy text:
Chapter 1 (introduction), Chapter 2.1 through 2.5 (probability and distributions), Chapter
10.1 to 10.3 (graphical models)
- See also class notes from Kevin
Murphy on on directed graphical
models, Markov models, and multivariate Gaussians
- Excellent 15 minute video on multivariate
Gaussian distributions from our own Alex Ihler
- Chapter from Chris Bishop's book on graphical models (the material on graphical models starts about 20 pages into the document)
- Topics: Concepts of models and
parameters. Definition of the likelihood function and the principle of maximum likelihood
parameter estimation. Using maximum likelihood methods to learn the parameters of Gaussian models, binomial,
multivariate and other parametric models.
- Note Set 3 above
- Barber text: pages 174-177
- Tutorial paper on maximum likelihood estimation
Optimization Methods for Machine Learning
- Topics: General principles of Bayesian estimation: prior densities, posterior densities, MAP,
fully Bayesian approaches. Beta/binomial and Gaussian examples. Predictive densities, model selection, model averaging .
- Note Sets 1 and 2 above
- Notes on analysis of binomial and multinomial models from Kevin Murphy
- Barber text: pages 191-194 (in Chapter 9, Learning as Inference), Pages 177-179 in Chapter 8, and Chapter 12 on Bayesian Model Selection
- Murphy text: Chapter 3.1 to 3.4 and Chapter 5.1, 5.2, 5.3
- An introductory chapter on the
principles of Bayesian inference by the late David Mackay, from
his excellent book
Information Theory, Inference, and Learning Algorithms. Also a link to a video of David
lecturing on An Introduction to Bayesian
The EM Algorithm, Mixture Models, and Probabilistic Clustering
State-Space and Time-Series Models
- Topics: Bayes rule, classification boundaries, discriminant functions, Optimal decisions,
Bayes error rate, Gaussian classifiers. Likelihood-based approaches and properties of
objective functions. Logistic regression and neural network models.
- Barber text: pages 229-234 (in Chapter 10 on Naive Bayes),
pages 353-358 on logistic regression (in Chapter 17 on Linear Models)
- Murphy text: pages 101-107 on Gaussian classifiers in Chapter 4, Chapter 8.1. 8.2, 8.3
- Notes on logistic regression
from Charles Elkan
- Paper on
Logistic regression for high-dimensional text data
- Topics: discrete and continuous latent-state space models. Hidden Markov models,
Kalman filters. Basic principles of smoothing and filtering. Parameter estimation methods using EM.
Tutorial paper on latent-variable models for time-series data, Barber and Cemgil,
IEEE Signal Processing Magazine, 2010.
- Barber text: pages 451-471 (in Chapter 23 on Dynamical Models)
- Murphy text: Chapter 17.1 to 17.5
- Sequential modeling using
recurrent neural networks from the Goodfellow et al. text