Introduction to Machine Learning and Data Mining, Winter 2007

ICS: 178

Instructor: Max Welling

Lecture TuTh 3.30-4.50pm PSCB220

Discussion: W 3.00-3.50pm ICS 243


ICS 6A/Mathematics 6A, Mathematics 6B, Mathematics 6C or 3A, Mathematics 2A-B, Statistics 67/Mathematics 67.


The goal of this class is to familiarize you with various state-of-the-art machine learning techniques for classification, regression, clustering and dimensionality reduction.
You will implement a number of algorithms on the netflix problem, participate in group discussions and give presentations.
At the end of the class you will be able to apply these techniques to novel problems in academia and industry.

Homework : Please see the slides.

Projects: Netflix problem. Netflix site

Code: Here is code to download the netflix data into memory (about 1Gig RAM required). Some note by Jeff Taggert

Syllabus: (subject to change)

1: Introduction: overview, examples, goals, probability, conditional independence, matrices, eigenvalue decompositions [slides Lec 1] [slides Lec 2]

2: Optimization and Data Visualization: Stochastic gradient descent, coordinate descent, centering, sphering, histograms, scatter-plots. [slides Lec 3] [slides Lec 4]

3: Least Squares Regression, Logistic Regression, Least Squares Matrix Factorization [slides Lec 5] [slides Lec 6]

4: Clustering: k-means and soft clustering (EM), MDL penalty. [slides Lec7]

5: Decision Trees & Boosting [slides Lec8]

Classification I: emprirical Risk Minimization, k-nearest neighbors, decision stumps, decision tree,

Classification II: random forests, boosting.

Neural networks: perceptron, logistic regression, multi-layer networks, back-propagation.

Regression: Least squares regression.

Dimesionality reduction: principal components analysis, Fisher linear discriminant analysis.

Reinforcement learning: MDPs, TD- and Q-learning, value iteration.

Bayesian methods: Bayes rule, generative models, naive Bayes classifier.

Matlab Demos

LSRegression, testLSR, LSRegression_demo, LogRegression_demo, plotGauss1D, ginput2

Grading Criteria

Grading will be based on a combination of weekly homework,  projects,  midterm and a final exam.


The textbook that will be used for this course is:

1. R.O. Duda, P.E. Hart, D. Stork: Pattern Classification

Optional side readings are:

2. Tom Mitchell: Machine Learning. (
3. D. MacKay: Information Theory, Inference and Learning Algorithms
4. C.M. Bishop: Neural Networks for Pattern Recognition
5. T. Hastie, R. Tibshirani, J.H, Friedman: The Elements of Statistical Learning
6. B.D. Ripley: Pattern Recognition and Neural Networks