Source: Darren Davis

Instructor: Max Welling

classification, regression, clustering and dimensionality reduction. Besides this, an important aspect

this class is to provide a modern statistical view of machine learning.

exam. Most homework will revolve around the implementation of various

classification algorithms. It is required that you use MATLAB for this coding work.

**Presentation Schedule:**

**The following groups have been scheduled for a presentation on these days:**

**Th. May 28 : Viveck Cadambe Group and Yi Yang Group**

**Tu. June 2: Phitchayaphong Tantikul Group and David Keator**

**Th. June 4: Michael Zeller Group and Ullas Sankhla Group**

**If you are taking the exam but do not find yourself affiliated with one of these groups you should contact me asap to get yourself scheduled.
Presentations can take 30 minutes each with 10 minutes for left for questions.
The presentation should be equally divided among the members of the group. Everyone is required to present a piece.
Note that each invidiual member will also have to write a report of at least 2 pages on his/her contribution to the project
(which will only be due in finals week).**

Week 1/2: [Slides: Intro, kNN, Logistic Regression, Overfitting][pdf] [Slides: Evaluation of Results][pdf]

Week 3/4: [Slides: DecisionTrees, Bagging, Boosting][pdf] [Slides: Nural Networks] [pdf]

Week 5: [Lecture-notes Convex Optimization][Slides: support Vector Machines] [pdf]

Week 6: [Lecture-notes Clustering (ps)][Slides: Unsupervised Learning] [pdf][Lecture-notes Kernel PCA]

Week 7: [Lecture-notes Kernel & Spectral Clustering][Lecture-notes Fisher LDA]

Week 8: [Lecture-notes Kernel Canonical Correlation Analysis]

There are 5 columns. The first 4 columns are feature values while the last value is the class label (1,2,3).

This dataset is an excellent starting point for a image retrieval system. You can use the one with 17 categories

to test your algorithms on.

Thesis in Features for Image Retrieval

Suggested Image Features to Check out:

-Color Histograms

-Histograms of Scale Invariant Features (SIFT)

-Histograms of Image Gradients

-Texture Filter Banks / Textons

-Gist Features (Torralba)

Code for Spatial Pyramid

Paper on Spatial Pyramid

**Very fast method for retrieval:
**Large Scale Online Learning of Image Similarity Through Ranking

ICML paper on this method

**A bunch of images of simulated Tumor growth are be supplied below.
A possible project with some real relevance is to take these images and predict the class of
tumor that generated it. I have held some test data behind.**

** **** **

We will do a contest who performs best on test data (wihout labels).

Prize: a bottle of champagne.

This paper on Earth Movers Distance

Exercises: Homework 1[pdf] [answer sheet HW1]

This paper on Flower Classification

Exercises: Homework 2[pdf] [answer sheet HW2]

Week 3/4: Reading:

Bishop, Sec 14 until (not including) Sec. 14.5.

Bishop, Chapter 5 until (not including) 5.6

This paper on "Bagging Predictors"

This paper on Convolutional Networks

Exercises: Homework 3 [pdf] [answer sheet HW3]

Week 5: Reading:

Bishop Chapter 6

This technical note on convex optimization

This paper on fast retrieval

These notes on SVMs

Exercises: Homework 4 [pdf][answer sheet HW4]

Week 6: Reading:

Bishop Chapter 9

This classnote on clustering

Exercises: Homework 5 [pdf][answer sheet HW5]

Week 7: Reading:

Bishop Chapter 9

This classnote on Kernel-PCA

This classnote on Kernel & Spectral Clustering

Exercises: Homework 6 [pdf][answer sheet HW6]

Bishop Chapter 4, section 4.1 only

This classnote on Fisher-LDA

Exercises: Homework 7 [pdf][answer sheet HW7]

1. Search engines now store more images than a human will see in its lifetime

2. Almost everyone carries a digital camera of some sort in his/her pocket

Conclusion 1: there is (or will very soon be) an obvious need for a search tool that searches for information based on

on an uploaded image. There is enough information on the internet to make this feasible.

Now consider this: have you ever been able to upload a picture into some website which then returned

related pictures or information about the objects in that picture? Not me, and I tried. Last year I had a mystery plant

in my garden and people claimed it was poiseness. I took a picture and tried to locate internet services that would

take my image and find webpages on the plant in question. Well, it didn't work. I got lost of red images, but very few plants.

I ended up going to a gardening center with my picture to find out that it was a Castor Bean (yes that is very poiseness). Anyway,

it felt like this information should have been easier to obtain.

Conclusion 2: This problem must be very difficult (if not, it would already exist).

There really are lots of cool applications of such a system. Imagine taking a picture of your kids skin rash and finding

out via this tool what some likely candidates for its possible underlying disease are. Or, imagine a tool for Alzheimer's patients

who are having a hard time recognizing their family friends. Or imagine you are on vacation in Rome and wish to know more about

that building who's name you really don't know.

So here is my challenge to you. Use the knowledge you learn in this class (and more) to build a very simple system of the above

kind. We will think about a nice restricted domain for which we can easily get data (California plants, Cars, Skin diseases, Faces).

We'll think about methods to use (which features to extract, how to build a useful kernel, what classification algorithm to use).

You can break up in groups of 5 students at most and divide the work. You will need to report your work through a presentation.

If we end up with systems that work reasonably well, we can build the actual tool as a demo and run it on a server. We can even

combine more than one system and combine their results using averaging.

Anyway, things are still a big open ended right now, but it will be very instructive and lots of fun!

Additional background reading

Grading will be based on a combination of, Homework (20%) , projects (30%) and a final exam (50%) .

(This information may change depending on whether a reader will be assigned to this class.)

The textbook that will be used for this course is:

Optional side readings are:

3. R.O. Duda, P.E. Hart, D. Stork: Pattern Classification

4. C.M. Bishop: Neural Networks for Pattern Recognition

5. T. Hastie, R. Tibshirani, J.H, Friedman: The Elements of Statistical Learning

6. B.D. Ripley: Pattern Recognition and Neural Networks

7. T. Mitchell: Machine Learning.