CS 274A: Probabilistic Learning: Theory and Algorithms

Course Goals:

Students will develop a comprehensive understanding of probabilistic approaches to learning from data. Probabilistic learning is a key component in many areas within modern computer science, including artificial intelligence, data mining, speech recognition, computer vision, bioinformatics, and so forth. The course will provide a tutorial introduction to the basic principles of probabilistic modeling and then demonstrate the application of these principles to the analysis, development, and practical use of machine learning algorithms.  Topics covered will include probabilistic modeling, defining likelihoods, parameter estimation using likelihood and Bayesian techniques,  probabilistic approaches to classification, clustering, and regression, and related topics such as model selection, bias/variance, and density estimation. Knowledge of basic concepts in probability, multivariate calculus, and linear algebra are highly recommended for this course. Homeworks will make use of the MATLAB programming environment (no prior knowledge of MATLAB required).
 

Syllabus (very much subject to change):

Week
Tuesday Thursday
  Notes


Week 1:
Introduction
Review of Probability: random variables, conditional and joint probabilities, Bayes rule, law of total probability, chain  rule and factorization. Different interpretations of probability: frequentist and Bayesian views.
Multivariate Models
Working with sets of random variables. The Multivariate Gaussian model.. Independence concepts and graphical models, naive Bayes, Markov models.
Required Reading:
Note Set 1

Optional Reading/References
Page 1 to 30 of the text
Week 2:
Learning from Data
Models and parameters. Concepts of bias and variance. Definition of the likelihood function. Basic principles of parameter estimation.
 
Maximum Likelihood Learning: Part 1
How to use maximum likelihood methods to learn the parameters of univariate Gaussian models, binomial and other parametric models.



Required Reading
Note Set 2

Optional Reading:
Article on the use of naive Bayes models in spam email filtering
Week 3:
Maximum Likelihood Learning: Part 2
Maximum likelihood methods for multivariate Gaussian models and multinomial models. 
Bayesian Learning: Part 1
General principles of Bayesian estimation: prior densities, posterior densities, maximum a posteriori (MAP), MPE, fully Bayesian approaches.
 

Required Reading:
Note Set 3

Optional Reading:
Tutorial paper on maximum likelihood estimation

 
Week 4:
Bayesian Learning: Part 2
Beta/binomial, Dirichlet/multinomial examples.
Bayesian Learning: Part 3
Bayesian estimation (ctd): estimation of Gaussian parameters. Recursive updating. Prediction with Bayes. Computational issues. Sampling methods.

Required Reading:

Bishop text: pages 67-80 and 97-102

 
Week 5:
Regression Modeling: Part 1
Linear models. Normal equations. Systematic and stochastic components.  Predicting a binary variable: logistic regression and neural networks.
Midterm Exam (in class)

Week 6:
Regression Modeling: Part 2
  Parameter estimation methods for regression. Maximum likelihood and Bayesian interpretations. Models for time-series data: auto-regressive models.

Decision Theory and Classification
Introduction, Bayes rule, Bayes decisions, optimality, risk. Bayes error rate, classification boundaries, discriminant functions 
 

Required Reading:
Bishop text: pages 137-165

Optional Reading:
Pages 1 to 33 of the Bias/Variance dilemma paper

Tipping's review paper on Bayesian regression

Draper's paper on Bayesian modeling

Week 7:
Classification
Introduction, Bayes rule, Bayes decisions, optimality, risk. Bayes error rate, classification boundaries, discriminant functions 
Bayes Error, Classification with Gaussian Models
 Gaussian classifiers, linear/quadratic boundaries. Relation to logistic and neural network models.
DHS Chapter 2.1 to 2.7 on classification



Week 8:
Classification continued
Classification continued

Papers on logistic regression:

Logistic regression for high-dimensional text data, by Genkin, Lewis, and Madigan

Logistic regression for high-dimensional data, PhD thesis by Paul Komarek

 

Week 9:
Mixture Models and EM: Part 1
K-means clustering. Mixtures of Gaussians and the associated EM algorithm. Clustering applications.
Mixture Models and EM: Part 2
Mixtures of conditional indepedence models. The EM algorithm. Applications to text data. Underlying theory of the EM algorithm.
  Optional Reading:
Fraley and Raftery paper on model-based clustering
 
Even more optional reading:
Sam Roweis's tutorial notes on unsupervised learning
Jeff Bilmes tutorial notes on EM
Frank Dellaert's tutorial notes on EM
Week 10:

Topics we may not get to.....


Applications: Modeling Text
Markov models and hidden variable models for text. Applications to classification and clustering of documents.

 

More topics we may not get to.....


Applications: Tracking Moving Objects
Kalman filter models (linear-Gaussian state-space models). Applications to analyzing video data.

Tutorial articles on Kalman filters:
Chapter 1 from Maybeck text
Kalman filter notes from Max Welling
Finals Week
Final Exam (in class)





Texts

Grading Policy

Based on a combination of homeworks, exams, and projects: 40% homeworks, 30% midterm, and 30% final. Note that a requirement for this course is that students learn and use MATLAB for their projects. MATLAB is a flexible high-level scientific programming language, available on certain ICS Unix machines, the NACS "gradEA" graduate Unix server, and in the NACS MSTB A and B PC labs.
 

Academic Honesty 

Academic honesty is taken seriously. For homework problems or programming assignments you are allowed to discuss the problems or assignments verbally with other class members, but under no circumstances can you look at or copy anyone else's written solutions or code relating to homework problems or programming assignments. All problem solutions and code submitted must be material you have personally written during this quarter, except for any library or utility functions which we supply. Failure to adhere to this policy can result in a student receiving a failing grade in the class. It is the responsibility of each student to be familiar with UCI's current academic honesty policies. Please take the time to read the current UCI Senate Academic Honesty Policies (in Spring Schedule of Classes, a few pages from the end). Also you may want to look at the ICS Department's policies on cheating .

UCI Catalog Course Description

Probabilistic Learning: Theory and Algorithms: A unified probabilistic framework for learning algorithms. Classical pattern recognition algorithms, probabilistic mixture models, kernel methods, hidden Markov models, among others. Multivariate data analysis concepts for classification and clustering. Methodologies such as cross-validation and bootstrap. Prerequisites: basic calculus and linear algebra.