Instructor: Max Welling

**Grading & Homwork:**

Every week there will be homework mostly consisting of a coding assignement. Coding will be done in MATLAB. Finishing all your homework assignments is required for passing this class. If you do not hand in your HW before the weekly deadline, you will acculumate penalty points (or viewed more positively, if you hand in your HW in time every week you will accumulate bonus points). I may ask students to demo their implementation of the HW assignment in class.

There will be short quizzes every week, starting in week 2, but no midterms or final. Quizzes will be multiple choice, and you will need to buy

There will also be a project, starting right from day 1. Details are given below. You may work in group's of up to 5 students but every member must do their own coding work. Each group will also be required to do a presentation and write a report (as a group). It is important that the various contributions of the groupmembers will be made transparent.

Your final grade will be determined as a combination of Homework (about 20%), Project (about 40%) and Quizz exams (about 40%).

**Projects:**

website for downloading data: http://cinquin.org.uk/static/datasets4welling/ .

There are two germlines, 44_unstraightened and 61_rb, each with four corresponding .mat files, which are as follows:

DAPI.mat - The original image of the germline

fullSeg.mat - The segmented image of the germline

coordinates.mat - The [y,x,z] coordinates of each of the n cells in the germline. A 3xn matrix.

individual_DAPI_segmentations.mat - Each of the segmented cells. A 61x61x61xn matrix, where each cell is included in a 61^3 cube. The segmented part of the original image is overlaid on a black background.

If you want to join this team please contact Kevin Heins: <kaheins@gmail.com>

Predicting stock prices.

Old Data (no longer of any use)

Your task is to build a model that will predict the highest and lowest price for stock "a" over the next 5 mins. Note that there are 7 time series in total but you are only asked to predict series "a". The other series are correlated with "a" so you can use them to make predictions. Each time series is aligned and has a number of attributes: opening and closing prices at 1 minute intervals, highest and lowest price in each interval, number of trades and total trading volume. Given all information up till time "t", you are asked to predict the highest and lowest price for stock "a" over the next 5 minutes (5 time points). This means that you can compute your regression label (response variable Y) as MAX[X(t+1),X(t+2),X(t+3),X(t+4),X(t+5)] where X is the attribute "highest price" in your data. Similarly for the MIN. Given your predictions the company involved will execute some trading strategy on data that you haven't seen and report how much money you would have made. More about this trading strategy and how you may be able to test your own programme later. For now you can start on making predictions on the training data.

Week 1: Introduction, kNN, Logistic Regression, Overfitting, Xvalidation [ppt] [pdf]

Week 2: Decision Trees, Random Forests, Bagging, Boosting [ppt] [pdf]

Week 3: Neural Networks [ppt] [pdf]

Week 4: Convex Optimization (see appendix A of the classnotes), SVMs [ppt] [pdf] (chapter 8 of classnotes)

Week 5: Unsupervised Learning: k-means clustering & PCA [ppt] [pdf]

Week 6: PCA & Kernel PCA (classnotes) [matlab demo],

Week 7: Receiver Operating Characteristic (ROC) [ppt] [pdf] , Kernel Fisher Linear Discriminant Analysis (classnotes)

Week 8: Spectral & Kernel Clustering (classnotes).

Week 9: Comparing Classfiers [ppt] [pdf] , Kernel Canoncial Correlation Analysis (classnotes)

Week 10:

** **** **

Week 2: Homework 2 [doc] [pdf]

Week 3: Homework 3 [doc] [pdf]

Week 4: Homework 4 [doc] [pdf]

Week 5: Homework 5 [doc] [pdf] (updated Th. Apr. 29)

Week 6: Homework 6 [doc] [pdf]

Week 7: Homework 7 [doc] [pdf] (updated Th. May 13)

Week 8: Homework 8 [doc] [pdf]

Week 9: Homework 9 [doc] [pdf]

Week 10:

The textbook that will be used for this course is:

Optional side readings are:

3. R.O. Duda, P.E. Hart, D. Stork: Pattern Classification

4. C.M. Bishop: Neural Networks for Pattern Recognition

5. T. Hastie, R. Tibshirani, J.H, Friedman: The Elements of Statistical Learning

6. B.D. Ripley: Pattern Recognition and Neural Networks

7. T. Mitchell: Machine Learning.