Workload: There will be weekly written assignments and reading assignments. The reading will primarily be from the text, but some additional papers will be assigned. Homeworks are due at class time on tuesdays. No late homework, but the lowest scoring homework will be dropped. There will be an open book final exam. All homework should be typed, except for drawings of figures. Students are required to do an individual project. Homeworks will be returned in the distribution center by the time of the next assignment. Many assignments will involve running algorithms from the Weka suite of Machine Learning programs and making intelligent comments about their behavior. You need not know Java to do this, and Weka has been installed on the ICS computers - or maybe not. The Weka site is at www.cs.waikato.ac.nz/ml/weka/index.html. You can also download the source from that site.



Grading: Your grade will be determined by your homework scores, the final, your class participation and your project. The structure of the final will be similar to the homeworks. In particular you can expect to:

  • write pseudo-code for a variation of an algorithm covered in class
  • do a space and time complexity analysis of some algorithm
  • do a "gedanken" experiment where you predict the behavior of an algorithm without running it.
  • identify which algorithm would be most appropriate for a problem and explain why


    Responsibility: I will be roughly following the text, but you are responsible for the content of the text even if it is not covered in lecture. In particular you should know and understand all the emphasized words. You are responsible for the content of the lectures.


    Goals: By the end of this course you will be familiar with the main effective methods for classification and regression learning. In particular for classification you will have learned about nearest neighbor, naive bayes, rules, decision trees, perceptron, and support vector approaches. Many of these methods can be also be adapted to regression. Finally the latest research in Machine Learning shows how to combine these methods to increase their effectiveness. We will also cover some exploratory data analysis methods such as clustering and association rule finding.


    With respect to each learning method you should understand its assumption, limitations, goals, and effectiveness. Although you may not implement any algorithms, you should understand how the algorithms are implemented in sufficient detail to carry out imaginary variations, applications and evaluations. You should develop a good intuitive understanding of the algorithms. You should feel that you could implement the algorithm if you had sufficient time.

    A nice set of notes which cover the more mathematical approaches to machine learning are at: http://www-2.cs.cmu.edu/~awm/tutorials/ These notes also have additional examples on some of the topics we will cover. Andrew Moore phd thesis was on regression and regression is emphasized in his view of Machine Learning.


    Tentative Reading Schedule:

    Final Presentation/Project: Presentations should be short (15-20 minutes) and done in powerpoint slides. The project should be accompanied by an appropriate paper. Each project should be based on some machine learning algorithm that is not covered in the text or on the analysis of some novel data set. The algorithm may be your own, but more likely it will be one from the literature. Presentations/papers of algorithms from the literature will not simply be a regurgitation of a paper, but include appropriate background information. You should clearly identify the main contribution, the significance of the contribution, and the evidence supporting the conclusion. Place the paper in context. You might view this as a tutorial for the algorithm. Projects may also be an analysis of novel data. I have one such data set: protein spectrum on mice with and without intestinal cancer, but many exist on the web or perhaps from your own research project.

    Week

    Readings Topic
    1 Chapters 1 & 2 Intro to ML and Weka
    2 Chapter 4 Decision Trees
    3 Chapter 5.1 Rule Learning
    4 Chapter 5.2 &: 5.3 Nearest Neighbor and Naive Bayes
    5 Chapter 5.4 & 5.5 Perceptron & Neural Nets
    6 Chapter 5.6 SVM
    7 Chapter 8 Clustering
    8-10 Student Presentations


    Assignments should always be typed and are due at class time. Your lowest assignment will be dropped. No late homework. Chapters and page numbers refer to the Data Mining text. Do not write more than you need to. Do not include irrelevant information. If you believe your assignment was mis-graded, a distinct possibility, simply resubmit the homework with an explanation and I will look at it again. This needs to done within 1 week of the returned assignment. Or see me during office hours.

    Warning: only the assignment for the next week is guaranteed to be accurate. The other assignments may change as the class progresses.

    Project. The Weka software has substantial changed since the text was written. The documentation is woeful behind. Your project is to explain a machine learning algorithm that is in the software and not explained or a machine learning algorithm that has appeared since 2000. A good source would be papers from late Machine Learning conferences, Journal of Machine Learning Research, Machine Learning or IJCAI or AAAI. You should expect to read several background papers. Your presentation, done in power-point, will be about 15-20 minutes long, so you need to concentrate on the idea. You will also have a write-up where you have room to explain the details of the algorithm. Your write-up should not be just a regurgitation of the paper, but include whatever background material is necessary. Typically papers assume that the reader has a good background in the subject. Instead, your paper should assume that the reader knows introductory ai and what we have covered in the Machine Learning course. Everyone will do a different algorithm, so you need to clear the project with me by the third week of class. You need only identify an algorithm plus include the appropriate reference(s) which will be the basis of your paper and talk.


    The Final: This is an open book, open notes exam. You can expect to do some problems that are similar to the homeworks. For example I will describe WHAT should be computed and you will need to write pseudo-code for the algorithm and do both a time and space complexity analysis. You will also be asked general questions about the various algorithms that we have studied. You may be asked to modify an algorithm to achieve a different result. If you have been attending class and doing ok on the homework, you should do ok on the final also. I welcome questions in class or via email.


    Week 1. Read chapters 1 and 2. This assignment has two points. You are to generate or find two data sets for classification. In one data set, the decision tree algorithm (j48) does well and nearest neighbor (ib1) does poorly. In the other data set, the converse is true. You will need to download the open source free Weka software to do this. You should define the two data sets, give an argument why one algorithm is better than the other for the particular data set, and finally verify your conclusion by doing a 10-fold cross-validation. Weka outputs a lot of information - only include that information which you use to support your arguments. For example, don't report the kappa statistic unless you tell me what it means and why you used it. This is important. Do not include superfluous information. Ok?


    Week 2.

    This problem is typical of the type of problem I use on the final. You could make up your similar problems.
    1. The various measures of impurity have concentrated on generating small decision trees. Give a precise definition of a new heuristic measure that likely creates large decision trees. Give a qualitative argument why this is true and estimate its performance relative to the standard decision tree algorithm as well as random decision trees. By estimate I only mean whether its performance is better, worse or the same. Provide a qualitative argument for your decision.
    2. Provide pseudo-code for an extension of the standard decision tree algorithm that does a k-move lookahead. The standard hill-climbing algorithm is a 1-move lookahead algorithm. Assume that you have N instances in the training set and each instance is defined by M binary attributes. Provide a computational analysis of the time-complexity of the algorithm with a 2-move lookahead. This can be done in a similar manner to the one in class. Hypothesize whether this algorithm will perform better or worse than the standard decision tree and give a qualitative argument why.
    3. Assume that all attributes are numeric. Let N be the number of examples and A be the number of attributes. What is the time complexity of forming a decision tree in this case?

    Week 3.

    1. Describe the algorithm with associated paper or project that you intend to do.
    2. Fill out the Machine Learning Review form for the paper by Domingos on Rise.
      MACHINE LEARNING: REVIEW FORM

      Title:
      Authors:
      Reviewer's Name:____________________________

      GOALS. Does the author clearly specify the learning task on which he/she is focusing? Does he/she state the research goals he/she is trying to achieve?

      DESCRIPTION. Does the paper describe the method(s) in sufficient detail for readers to replicate the work? For instance, does it describe the inputs and outputs of the system? Does it clearly explain the algorithms used for performance and learning? Does the paper include enough examples?

      EVALUATION. Does the author carefully evaluate the approach to learning? For instance, does the author run systematic experiments, provide a careful theoretical analysis, show psychological validity, or give evidence of generality?

      DISCUSSION. Does the paper make contact with relevant earlier work, noting similarities, differences, and progress? Does it discuss the limitations of the approach along with its advantages? Does it consider the implications of the approach and outline directions for future work?

      GENERAL. Does the paper make a significant, technically sound contribution? Is the paper well-organized and well-written? Does it use standard terminology? Feel free to give additional comments.

      RECOMMENDATION: Accept____ Borderline____ Reject____

      CONFIDENCE: High _____ Medium _______ Low ______


    Week 4. In this homework you are asked to extend the 1R algorithm covered in class. Assume that the data is in Weka format and that all attributes are nominal.

    a. Write pseudo-code for a "2R" algorithm, i.e. an algorithm which finds the optimal theory where each rule in the theory has the form: if (Attibute1 = value) and (Attribute2 = value' ) then classK. Let T be the number of elements in the training data. Optimality is defined in the same way as for 1R. Note that the result of 2R is a set of rules, but each rule in the "theory" has the same two attributes.

    b. Given a time/space complexity analysis of your pseudo-code.

    c. Without giving an algorithm, write down the time/space complexity of the "kR"algorithm. As before this algorithm yields the optimal theory where each rule has k conditions.



    Week 5.

    1. Suppose a data set has only numeric data with two classes. Also suppose that it is linearly separable. Prove that the space of solutions (linear separators) is convex. Recall: a set S is convex if for two points p and q in S, the point a*p+(1-a)q also belongs to S, where 0<=a<=1.
    2. The standard Perceptron algorithm was presented after doing a data transformation: augmented the data with a constant. This made the algorithm simpler to state and prove theorems about. Provide pseudo-code for the standard Perceptron algorithm where these data transformation have not occurred. Now you will need to explicitly deal updating the threshold. Your code should correspond exactly to code presented in class.
    3. Using the cpu data that is provided with Weka, display a table compare the performance of the following regression algorithms (use 10-fold CV): least mean squares (LMS or Linear Regression),IB1 and IB3, M5, SMOreg with the exponent set to 1 (default) and 2. Does it matter which of the 5 performance measures are used? Which result is most interpretable?


    Weeks 6-7-8. Prepare your paper and presentation

    Weeks 9-10: Make presentation in class. You should submit your powerpoint slides to me the day before your presentation. This will make for smoother transitions between talks. In some cases you may want to use your own laptop, if you have a specific demo to run. Clear this with me first.
    Besides making a presentation, every student is required to comment on each presentation in writing. The goal of this exercise is to give helpful feedback to the speaker so that the presentation can be improved. These comments will be anonymous and address issues such as: goal, clarity, significance, and evidence. The comment form is given below. For each talk, your comments should take no more than 1 page in total. At the next class period paper-clip your comment pages together and add a cover sheet which just has your name on it. Please put your comment pages in alphabetical order by the name of the speaker. I will resort the papers and return all comments via the distribution center.
    Speaker Name:


    Speaker Dates