CS 274A | Notes/Reading

Some of the class notes below may be updated during the quarter - if this happens it will be announced on Ed.

Background/Review Material on Probability

Topics covered: random variables, conditional and joint probabilities, Bayes rule, law of total probability, chain rule and factorization. Frequentist and Bayesian views of probability.
Required Reading:
- Class Notes 1 on Probability Concepts
- Chapter 6.1 to 6.5 in Mathematics for Machine Learning
Optional Additional References and Reading
- For a review of basic concepts in probability see Grinstead and Snell's Introduction to Probability
- Chapter 2 in Kevin Murphy's Probabilistic Machine Learning: An Introduction
- Murphy course notes on univariate and multivariate Gaussian densities
- Excellent 15 minute video on multivariate Gaussian densities from Alex Ihler

Conditional Independence and Graphical Models

Topics covered: Sets of random variables. Conditional independence and graphical models. Markov models.
Required Reading:
- Class Notes 2 on Graphical Models
Optional Additional References and Reading
- Section 8.5 on directed graphical models in Mathematics for Machine Learning
- Murphy course notes on directed graphical models and on Markov models
- Chapter from Chris Bishop's book on graphical models (the material on graphical models starts about 20 pages into the document)

Learning from Data using Maximum Likelihood

Topics: Concepts of models and parameters. Definition of the likelihood function and the principle of maximum likelihood parameter estimation. Using maximum likelihood methods to learn the parameters of Gaussian models, binomial, multivariate and other parametric models.
Required Reading:
- Class Notes 3 on Likelihood
Recommended Reading:
- Sections 4.1 through 4.2.5 in Murphy's Probabilistic Machine Learning: An Introduction
Optional Additional References and Reading
- Tutorial paper on maximum likelihood estimation

Bayesian Learning

Topics: General principles of Bayesian estimation: prior densities, posterior densities, MAP, fully Bayesian approaches. Beta/binomial and Gaussian examples. Predictive densities, model selection, model averaging.
Required Reading:
- Review Note Sets 1 and 2 again; and Class Notes 4 on Bayesian Learning
Recommended Reading:
- Sections 4.6 in Murphy's Probabilistic Machine Learning: An Introduction
- Murphy course notes on analysis of binomial and multinomial models
Optional Additional References and Reading
- Barber textbook: Sections 9.1 and 9.2 on Learning as Inference) and Chapter 12 on Bayesian Model Selection
- Chapter 3 from Efron and Hastie on Bayesian Inference
- An introductory chapter on the principles of Bayesian inference by the late David Mackay, from his excellent book Information Theory, Inference, and Learning Algorithms. Also a link to a video of David lecturing on An Introduction to Bayesian Inference.

Optimization Methods for Machine Learning

Topics: General principles of finding minima/maxima of multivariate functions, gradient and Hessian methods, stochastic gradient methods.
Recommended Reading:
- Chapter 7 on Continuous Optimization in Mathematics for Machine Learning
Optional Additional References and Reading
- Geoff Hinton's course slides on optimization for machine learning (a good introduction to the basic concept, focuses on optimization from about slide onwards)
- Chapter on Optimization from the Hardt and Recht text (Patterns, Predictions, and Actions)
- Chapter on Numerical Optimization from the Goodfellow et al. text.
- Tutorial paper from Leon Bottou on stochastic gradient methods.
- Notes on conjugate gradient descent, from Jonathan Shewchuk (with good insights into geometric aspects of optimization in general)
- Optimization for Data Analysis, comprehensive textbook on optimization methods relevant to machine learning, from Wright and Recht (should be accessible online if you are on the UCI network)

Regression Models

Topics: Linear models. Systematic and stochastic components. Parameter estimation methods for regression. Maximum likelihood and Bayesian interpretations.
Required Reading:
- Class Notes on Regression
Recommended Reading:
- Andrew Ng's notes on supervised learning (good simple introduction to basic concepts)
- Sections 8.1 and 8.2 on empirical risk minimization and Chapter 9 on linear regression in the Mathematics for Machine Learning text.
- Nice blog post on key ideas associated with the bias-variance trade-off
- Slides from CMU on bias-variance
Optional Additional References and Reading
- Murphy text: Introductory sections of Chapters 11, 12, 13 in Probabilistic Learning: An Introduction by Kevin Murphy, 2022
- Chapter 8 on generalized linear models (GLMs) in Hastie and Efron book
- General discussion of classification and regression in Goodfellow et al. book.
- pages 1 to 33 of a classic paper on the bias/variance tradeoff
- Review paper on Bayesian regression

Classification

Topics: Bayes rule, classification boundaries, discriminant functions, Optimal decisions, Bayes error rate. Likelihood-based approaches and properties of objective functions. Logistic regression and neural network models.
Required Reading:
- Class Notes on Discriminant Functions and Optimal Classification (PDF)
Optional Reading:
- Murphy text: Chapter 10 on Logistic Regression in Probabilistic Learning: An Introduction by Kevin Murphy, 2022
- Notes on logistic regression from Charles Elkan

The EM Algorithm, Mixture Models, and Probabilistic Clustering

Topics: Mixtures of Gaussians and the associated EM algorithm. K-means clustering. Underlying theory of the EM algorithm.
Required Reading:
- Class notes on mixture models and EM for Gaussian mixtures
Optional Reading:
- Murphy text: Chapter 3.5 on Mixture Models, Chapter 8.7 on Bound Optimization and EM, and Chapter 21.4 on Clustering using Mixture Models in Probabilistic Learning: An Introduction by Kevin Murphy, 2022
- Notes on derivation of EM algorithm from Justin Domke
- Jeff Bilmes tutorial notes on EM
- The EM Algorithm: a Short Tutorial by S. Borman
- Bayesian learning of mixtures, with Gibbs sampling from Dave Blei
- Fraley and Raftery's JASA paper on model-based clustering

State-Space and Time-Series Models

Topics: discrete and continuous latent-state space models. Hidden Markov models, Kalman filters.
Optional Reading:
- Tutorial paper on latent-variable models for time-series data, Barber and Cemgil, IEEE Signal Processing Magazine, 2010.
- Murphy text (Book 2): Chapter 29 on State-Space models in Probabilistic Learning: Advanced Topics by Kevin Murphy, January 2023
- Murphy text (Book 1): Chapter 15 on Neural Networks for Sequences in Probabilistic Learning: An Introduction by Kevin Murphy, 2022