- Notation Guide
- Note Set 1: Review of Probability (PDF)
- Note Set 2: Multivariate Probability Models (PDF)
- Note Set 3: Models, Parameters, and Likelihood (PDF)
- Note Set 4: Mixture Models and the EM Algorithm (PDF)
- Note Set 5: Hidden Markov Models (PDF)
- Notes on Discriminant Functions and Optimal Classification (PDF)

- Deep Learning, by Goodfellow, Bengio, and Courville, Bengio, MIT Press, 2016. Even though this text is mostly about deep learning (Sections II and III, beyond the scope of our class), Section I is about probabilistic learning in general and provides a lot of very useful background material for this class.
- Machine Learning: A Probabilistic Perspective, by Kevin Murphy, MIT Press, 2012. The current standard reference text for probabilistic machine learning. Covers far more than we will cover in this 10-week class. If you plan to use machine learning in your research after this class you may want to go ahead buy a copy of this text - you will find it to be a very useful reference in your research.
- Bayesian Reasoning and Machine Learning, by David Barber, Cambridge University Press. Another useful reference text on probabilistic learning (the PDF version is free).
- Model-based machine learning, Chris Bishop,
*Phil Trans R. Soc. A*, 2012. A well-written overview article that reviews some of the key ideas behind probabilistic model-based learning

- Topics: random variables, conditional and joint probabilities, Bayes rule, law of total probability, chain rule and factorization. Frequentist and Bayesian views of probability. Sets of random variables, the multivariate Gaussian model. Conditional independence and graphical models.
- Note Sets 1 and 2 above
- Goodfellow et al text: Chapter 3, Probability and Information Theory (well worth reading before doing Homework 1)
- Barber text: Chapter 1.1, 1.2 (basic probability), Chapter 2 (graphs), Chapter 3.1 to 3.3 (directed graphical models, referred to here as 'belief networks'), sections 8.1 to 8.4 (univariate and multivariate distributions)
- Murphy text: Chapter 1 (introduction), Chapter 2.1 through 2.5 (probability and distributions), Chapter 10.1 to 10.3 (graphical models)
- See also class notes from Kevin Murphy on on directed graphical models, Markov models, and multivariate Gaussians
- Excellent 15 minute video on multivariate Gaussian distributions from our own Alex Ihler
- Chapter from Chris Bishop's book on graphical models

- Topics: Concepts of models and parameters. Definition of the likelihood function and the principle of maximum likelihood parameter estimation. Using maximum likelihood methods to learn the parameters of Gaussian models, binomial, multivariate and other parametric models.
- Note Set 3 above
- Barber text: pages 174-177
- Tutorial paper on maximum likelihood estimation

- Topics: General principles of Bayesian estimation: prior densities, posterior densities, MAP, fully Bayesian approaches. Beta/binomial and Gaussian examples. Predictive densities, model selection, model averaging .
- Note Sets 1 and 2 above
- Notes on analysis of binomial and multinomial models from Kevin Murphy
- Barber text: pages 191-194 (in Chapter 9, Learning as Inference), Pages 177-179 in Chapter 8, and Chapter 12 on Bayesian Model Selection
- Murphy text: Chapter 3.1 to 3.4 and Chapter 5.1, 5.2, 5.3
- An introductory chapter on the principles of Bayesian inference by the late David Mackay, from his excellent book Information Theory, Inference, and Learning Algorithms. Also a link to a video of David lecturing on An Introduction to Bayesian Inference.

- Geoff Hinton's class notes on optimization for machine learning (a good introduction to the basic concepts)
- Chapter on Numerical Optimization from the Goodfellow et al. text.
- Simple tutorial on gradient descent, with examples in R code.
- Tutorial paper from Leon Bottou on stochastic gradient methods.
- Notes on conjugate gradient descent, from Jonathan Shewchuk (with good insights into geometric aspects of optimization in general)
- Slides from Stephen Wright optimization and machine learning: IPAM 2012 slides and NIPS 2010 slides (see also video).

- Topics: Linear models. Normal equations. Systematic and stochastic components. Parameter estimation methods for regression. Maximum likelihood and Bayesian interpretations.
- Andrew Ng's notes on supervised learning (good introduction to the basic concepts)
- Barber text: Chapter 17.1 to 17.3 on linear models
- Murphy text: Chapter 7.1, 7.2, 7.3 and 7.6
- General discussion of classification and regression from Goodfellow et al. text
- Nice blog post on key ideas associated with the bias-variance trade-off
- Slides from CMU on bias-variance
- pages 1 to 33 of a classic paper on the bias/variance tradeoff
- Mike Tipping's review paper on Bayesian regression

- Topics: Bayes rule, classification boundaries, discriminant functions, Optimal decisions, Bayes error rate, Gaussian classifiers. Likelihood-based approaches and properties of objective functions. Logistic regression and neural network models.
- Barber text: pages 229-234 (in Chapter 10 on Naive Bayes), pages 353-358 on logistic regression (in Chapter 17 on Linear Models)
- Murphy text: pages 101-107 on Gaussian classifiers in Chapter 4, Chapter 8.1. 8.2, 8.3
- Notes on logistic regression from Charles Elkan
- Paper on Logistic regression for high-dimensional text data

- Topics: Mixtures of Gaussians and the associated EM algorithm. K-means clustering. Mixtures of conditional indepedence models. Applications to text data. Underlying theory of the EM algorithm.
- Note Set 4 above (EM for Gaussian mixture models)
- General derivation of the EM Algorithm: pages 404-406 in Barber, pages 363-369 in Murphy
- Barber text: pages 403-416
- Murphy text: pages 337-356 (Chapter 11)
- Jeff Bilmes tutorial notes on EM
- Frank Dellaert's tutorial notes on EM
- Liang and Klein's Online EM with applications to text
- Fraley and Raftery paper on model-based clustering

- Topics: discrete and continuous latent-state space models. Hidden Markov models, Kalman filters. Basic principles of smoothing and filtering. Parameter estimation methods using EM.
- Barber text: pages 451-471 (in Chapter 23 on Dynamical Models)
- Murphy text: Chapter 17.1 to 17.5
- Sequential modeling using recurrent neural networks from the Goodfellow et al. text

- Topics: Importance sampling, Gibbs sampling, and related ideas
- Tutorial chapter on sampling and Monte Carlo methods from the Goodfellow et al. text
- Barber text: pages 543-553 (in Chapter 27 on Sampling)