CS 274A: Probabilistic Learning: Theory and Algorithms

Course Goals:

Students will develop a comprehensive understanding of probabilistic approaches to learning from data. Probabilistic learning is a key component in many areas within modern computer science, including artificial intelligence, data mining, speech recognition, computer vision, bioinformatics, and so forth. The course will provide a tutorial introduction to the basic principles of probabilistic modeling and then demonstrate the application of these principles to the analysis, development, and practical use of machine learning algorithms.  Topics covered will include probabilistic modeling, defining likelihoods, parameter estimation using likelihood and Bayesian techniques,  probabilistic approaches to classification, clustering, and regression, and related topics such as model selection, bias/variance, and density estimation. Knowledge of basic concepts in probability, multivariate calculus, and linear algebra are strongly recommended for this course. Homeworks will make use of the MATLAB programming environment (no prior knowledge of MATLAB required).
 

Syllabus (subject to change):

Week
Tuesday Thursday
  Notes


Week 1:
Introduction
Review of Probability: random variables, conditional and joint probabilities, Bayes rule, law of total probability, chain  rule and factorization. Different interpretations of probability: frequentist and Bayesian views.
Multivariate Models
Working with sets of random variables. The multivariate Gaussian model.

Required Reading:
Note Sets 1 and 2 (above)

Optional Reading/References
Page 1 to 30 of the Bishop text


Article on the use of naive Bayes models in spam email filtering

Week 2:
Graphical Models
Independence concepts, particularly conditional independence. Graphical models, with examples such as naive Bayes and Markov models.
Learning from Data
Models and parameters. Concepts of bias and variance. Definition of the likelihood function and the principle of maximum likelihood parameter estimation.


Required Reading
Note Sets 2 and 3

Week 3:
Maximum Likelihood Learning
How to use maximum likelihood methods to learn the parameters of Gaussian models, binomial, multivariate and other parametric models.
Maximum Likelihood Learning (part II)
 

Required Reading:
Note Set 3


Optional Reading:
Tutorial paper on maximum likelihood estimation

 
Week 4:
Bayesian Learning: I
General principles of Bayesian estimation: prior densities, posterior densities, MAP, MPE, fully Bayesian approaches. Beta/binomial
Bayesian Learning: II
Bayesian estimation (ctd): estimation of Gaussian parameters.

Optional Reading:

Bishop text: pages 67-80 and 97-102

 
Week 5:
Bayesian Learning: III
Predictive densities, model selection, model averaging
Midterm Exam (in class)

Optional Reading:

Draper's paper on Bayesian modeling


Week 6:

Classification
Introduction, Bayes rule, classification boundaries, discriminant functions 


Classification II
Optimal decisions, Gaussian classifiers, Bayes error rate discriminant functions 
 


Optional Reading:
Bishop text: pages 179-184 and 196-204


Week 7:
Classification III
Class-conditional modeling. Likelihood-based approaches and properties of objective functions.

Classification IV
Logistic regression and neural network models.

Required Reading:
Overview notes on the logistic model

Optional Reading

Week 8:
Mixture Models and EM I
K-means clustering. Mixtures of Gaussians and the associated EM algorithm. Clustering applications. Mixtures of conditional indepedence models.

Mixture Models/EM II
The EM algorithm. Applications to text data. Underlying theory of the EM algorithm.

Optional Reading:

Fraley and Raftery paper on model-based clustering
 
Even more optional reading:
Sam Roweis's tutorial notes on unsupervised learning


Jeff Bilmes tutorial notes on EM


Frank Dellaert's tutorial notes on EM

Week 9:
Regression Modeling: Part 1
Linear models. Normal equations. Systematic and stochastic components.
Regression Modeling: Part 2
Parameter estimation methods for regression. Maximum likelihood and Bayesian interpretations. Models for time-series data: auto-regressive models.

Optional Reading:

Pages 1 to 33 of the Bias/Variance dilemma paper

Tipping's review paper on Bayesian regression

 

Week 10:

Topics we may not get to.....


Applications: Modeling Text
Markov models and hidden variable models for text. Applications to classification and clustering of documents.

 

More topics we may not get to.....


Applications: Tracking Moving Objects
Kalman filter models (linear-Gaussian state-space models). Applications to analyzing video data.

Optional Reading:

Tutorial articles on Kalman filters:
Chapter 1 from Maybeck text
Kalman filter notes from Max Welling

Finals Week
Final Exam





Texts

Grading Policy

Based on a combination of homeworks, exams, and projects: 40% homeworks, 30% midterm, and 30% final. Note that a requirement for this course is that students learn and use MATLAB for their projects. MATLAB is a flexible high-level scientific programming language, available on certain ICS Unix machines, the NACS "gradEA" graduate Unix server, and in the NACS MSTB A and B PC labs.
 

Academic Honesty 

Academic honesty is taken seriously. For homework problems or programming assignments you are allowed to discuss the problems or assignments verbally with other class members, but under no circumstances can you look at or copy anyone else's written solutions or code relating to homework problems or programming assignments. All problem solutions and code submitted must be material you have personally written during this quarter, except for any library or utility functions which we supply. Failure to adhere to this policy can result in a student receiving a failing grade in the class. It is the responsibility of each student to be familiar with UCI's current academic honesty policies. Please take the time to read the current UCI Academic Honesty Policies.

UCI Catalog Course Description

Probabilistic Learning: Theory and Algorithms: A unified probabilistic framework for learning algorithms. Classical pattern recognition algorithms, probabilistic mixture models, kernel methods, hidden Markov models, among others. Multivariate data analysis concepts for classification and clustering. Methodologies such as cross-validation and bootstrap. Prerequisites: basic calculus and linear algebra.