#### CS 274A: Background Notes and Reading, Winter 2017

###### Textbooks (recommended for background reading but not required)
• Deep Learning, by Goodfellow, Bengio, and Courville, Bengio, MIT Press, 2016. Even though this text is mostly about deep learning (Sections II and III, beyond the scope of our class), Section I is about probabilistic learning in general and provides a lot of very useful background material for this class.
• Machine Learning: A Probabilistic Perspective, by Kevin Murphy, MIT Press, 2012. The current standard reference text for probabilistic machine learning. Covers far more than we will cover in this 10-week class. If you plan to use machine learning in your research after this class you may want to go ahead buy a copy of this text - you will find it to be a very useful reference in your research.
• Bayesian Reasoning and Machine Learning, by David Barber, Cambridge University Press. Another useful reference text on probabilistic learning (the PDF version is free).
• Model-based machine learning, Chris Bishop, Phil Trans R. Soc. A, 2012. A well-written overview article that reviews some of the key ideas behind probabilistic model-based learning

General Background/Review Material on Probability :
• Topics: random variables, conditional and joint probabilities, Bayes rule, law of total probability, chain rule and factorization. Frequentist and Bayesian views of probability. Sets of random variables, the multivariate Gaussian model. Conditional independence and graphical models.
• Note Sets 1 and 2 above
• Goodfellow et al text: Chapter 3, Probability and Information Theory (well worth reading before doing Homework 1)
• Barber text: Chapter 1.1, 1.2 (basic probability), Chapter 2 (graphs), Chapter 3.1 to 3.3 (directed graphical models, referred to here as 'belief networks'), sections 8.1 to 8.4 (univariate and multivariate distributions)
• Murphy text: Chapter 1 (introduction), Chapter 2.1 through 2.5  (probability and distributions), Chapter 10.1 to 10.3 (graphical models)
• See also class notes from Kevin Murphy on on directed graphical models, Markov models, and multivariate Gaussians
• Excellent 15 minute video on multivariate Gaussian distributions from our own Alex Ihler
• Chapter from Chris Bishop's book on graphical models

Learning from Data using Maximum Likelihood
• Topics: Concepts of models and parameters. Definition of the likelihood function and the principle of maximum likelihood parameter estimation. Using maximum likelihood methods to learn the parameters of Gaussian models, binomial, multivariate and other parametric models.
• Note Set 3 above
• Barber text: pages 174-177
• Tutorial paper on maximum likelihood estimation

Bayesian Learning

Optimization Methods for Machine Learning

Regression Models

Probabilistic Classification
• Topics: Bayes rule, classification boundaries, discriminant functions, Optimal decisions, Bayes error rate, Gaussian classifiers. Likelihood-based approaches and properties of objective functions. Logistic regression and neural network models.
• Barber text: pages 229-234 (in Chapter 10 on Naive Bayes), pages 353-358 on logistic regression (in Chapter 17 on Linear Models)
• Murphy text: pages 101-107 on Gaussian classifiers in Chapter 4, Chapter 8.1. 8.2, 8.3
• Notes on logistic regression from Charles Elkan
• Paper on Logistic regression for high-dimensional text data

The EM Algorithm, Mixture Models, and Probabilistic Clustering
• Topics: Mixtures of Gaussians and the associated EM algorithm. K-means clustering. Mixtures of conditional indepedence models. Applications to text data. Underlying theory of the EM algorithm.
• Note Set 4 above (EM for Gaussian mixture models)
• General derivation of the EM Algorithm: pages 404-406 in Barber, pages 363-369 in Murphy
• Barber text: pages 403-416
• Murphy text: pages 337-356 (Chapter 11)
• Jeff Bilmes tutorial notes on EM
• Frank Dellaert's tutorial notes on EM
• Liang and Klein's Online EM with applications to text
• Fraley and Raftery paper on model-based clustering

State-Space and Time-Series Models
• Topics: discrete and continuous latent-state space models. Hidden Markov models, Kalman filters. Basic principles of smoothing and filtering. Parameter estimation methods using EM.
• Barber text: pages 451-471 (in Chapter 23 on Dynamical Models)
• Murphy text: Chapter 17.1 to 17.5
• Sequential modeling using recurrent neural networks from the Goodfellow et al. text

Sampling Methods