HW5
Due: 5/31 11:00am EEE Dropbox
Nearest-neighbor action recognition
In this assignment, you will implement an action/gesture recognition system using a nearest-neighbor (NN) classifier. You will use the "action" matfile from the project video set. You will explore different aspects of nearest neighbor classification, including smoothing and the choice of distance functions.
Overview: You will be given skeleton code here. The high level script "hw5.m" is a wrapper script that reads in the action matfile and constructs a set of video features 'x' and action labels 'y'. The dataset contains 80 video clips. You will evaluate your system using leave-one-out cross validation: given a dataset of N labeled examples (1) assume that N-1 are given as training data and use them to predict the value of the Nth clip, record the accuracy, and repeat for all N choices of the held-out clip.
Helper functions
- class_confusion.m [10 pts]
This function will construct a K by K matrix, where entry (i,j) is the fraction of times a clip of class 'i' is mislabeled as class 'j'. This class-confusion matrix is often used to evaluate the behaviour of a multiclass classification system.
Basic 1-NN classifier
- classifyNN_cen.m [20 pts]
This main function will estimate the label of a video clip, given a collection of training video clips and their labels. It does by computing the SSD cost between the center frame of the new clip to the center frame of each training clip, resizing both to be the same size. For speed, it may be helpful to pick a small image size. It then returns the label of the closest matching training clip.
What to hand in: Hand in all the completed functions above, complete with comments. You will also need to hand in code for the various extensions described below. Also hand in plots of the class-confusion matrices. For both the basic classifier and the proposed extensions, use the confusion matrix to explain whether or not the system is making reasonable errors (eg, confusing two actions that appear similar).
Q1. Implement the basic DP NN classifier. Search for an image size that provides good performance while being reasonably fast. Report back the error rate, the best image size, a plot of the class confusion matrix, and an explanation of the errors. [10 pts answer to question]
Q2. (Dynamics) The basic classifier only score a single static frame, and so ignores the dynamics of the action. A simple way to incorporate dynamics is to compute the SSD over the center 'w' frames, where 'w' = 1,3,5, or higher. Implement this extension to classifyNN_cen.m, and find the optimal 'w'. Report back the error rate, the optimal 'w', a plot of the class confusion matrix, and an explanation of the errors. [20 pts code] [10 pts answer to question]
Q3 (Smoothing) NN-classifiers have a tendency to overfit to the training data. One way to reduce the overfitting is to use report back the most-common label from the set of K-closest neighbors to a test example (where K is typically 1,3,5, or higher). Implement this extension to classifyNN_cen.m, and find the optimal 'K'. Report back the error rate, the optimal 'K', a plot of the class confusion matrix, and an explanation of the errors. [20 pts code] [10 pts answer to question]
Q4 (Distance functions) SSD may incorrectly penalize for temporal misalignment between a training video and a test video. One simple method of alignment is to define the distance between a test and training point to be the minimum distance overall 'w'-frame sub-clips within each video. Implement a new classifyNN_best.m function, using the optimal choice of 'w' and 'k' from above. Note that this will significantly increase the running time, so I advise you to debug on a small training set first. Report back the error rate, a plot of the class confusion matrix, and an explanation of the errors. [20 pts code] [10 pts answer to question]
Q5 [EXTRA-CREDIT]. We have looked at a variety of scoring functions beyond SSD in this class, including NCC, chamfer, etc. Find one that improves upon the best error rate you obtained above. [10 pts code and 10 points answer to question]