Note: Start this homework early. You may have to use the computers at school but you can download the free opensource Weka software
Also read over all the questions before you start, otherwise you may have to repeat some experiments.
In this homework you will use the Weka software to analyze the iris data set, a standard data set used to evaluated statistical algorithms. This data set is provided with Weka.
For each of the algorithms below, report the accuracies using 2fold and 10fold
crossvalidation.

 ZeroR which is the dumbest algorithm of all. It’s the baseline.
 kNearestNeighbor (Ibk) with k = 1, k=3, and k = 5. In the IBL folder.
 j48: the decision tree algorithm. In the Trees folder.
 Part: the algorithm for generating rules. In the Rules folder.
 Naïve Bayes: a statistical approach using Bayes Rule. In the Bayes folder

 Do you expect that 2fold or 10fold CV will yield a higher estimate of the accuracy of the algorithm?
 Why?
 Does your data support this conclusion? Be specific.
 Which learning methods produced interpretable results?
 From the 10fold CV data, order the algorithms by accuracies.
 For the remaining questions, only consider the decision tree algorithm with 10 fold CV. Report the confusion matrix for the decision tree algorithm
 List, in order, the classes predicted with highest precision, i.e. the probability that the example was
of class "a" given that the algorithm predicted it was of class "a". Show how the probabilities were computed.
 List, in order, the classes predicted with highest recall, i.e. the probability that the example was
predicted to be of class "a" given that it was of class "a". Show how the probabilities were computed.