algo
Class AlgoClassifier

java.lang.Object
  |
  +--algo.AlgoClassifier
Direct Known Subclasses:
GiniClassificator

public abstract class AlgoClassifier
extends java.lang.Object
Base class for the algorithmic classifiers. All implemented classifiers inherit from this abstract class. If just the abstract methods 'calculateAttribute()', 'sortCategories(CategoryArray cat, Dataset[] sets, int nAttributIndex)' and getName() are overridden, the implemented pruning is cost-complexity pruning (Breiman, et al.: CART). A new pruning technique can be implemented by overridding the methods 'initPruning()' and 'optimizeTree()'.
Since:
PBC 2.1
Author:
Mihael Ankerst

Field Summary
protected  float m_fPurity
          The maximum purity-threshold. 
protected  int m_nSetNumber
          The threshold for the number of records in a node. 
protected  boolean m_proposeSplit
          Flag indicating whether 'Propose Split' has been invoked.
protected  Pruner m_pruner
 
Constructor Summary
AlgoClassifier()
 
Method Summary
abstract  void calculateAttribute(Node n, java.util.Vector vAttributes)
          Computes the best attribute and the best split points for the given node and the gives set of attributes. 
 void classifyLevels(Step s, int nLevels)
          Constructs a tree with 'nLevels' levels for the data corresponding to 'Step s'.
 void classifyTree(Step s)
          Constructs a tree for the data corresponding to 'Step s'.
abstract  java.lang.String getName()
          Returns the name of the classifier that appears in the Combobox for selection.
 void initPruning(AlgoClassifier classifier, Dataset[] validationDataset)
          Initializes the pruning method which is based on a separate validation set.
 void initPruning(AlgoClassifier classifier, int nFoldNumber)
          Initializes the pruning method which is based on cross-validation.
 Step optimizeTree(Step s)
          Prunes the tree which has 'Step s' as a root node.
 Step optimizeTree(Step s, JStatusBar bar)
          Prunes the tree which has 'Step s' as a root node. 
 void setParameters(float fPurity, int nSetNumber)
          Paremeter setting for forward pruning.
 void setProposeSplitFlag(boolean ps)
 
abstract  int[] sortCategories(CategoryArray cat, Dataset[] sets, int nAttributIndex)
          Sorts the categories of a categorical attribute. 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

m_pruner

protected Pruner m_pruner

m_fPurity

protected float m_fPurity
The maximum purity-threshold. During decision tree construction, this threshold can be used to assign the majority class to the node, if the threshold is exceeded.

m_nSetNumber

protected int m_nSetNumber
The threshold for the number of records in a node. During decision tree construction, this threshold can be used to assign the majority class to the node, if there are less than 'm_nSetNumber' associated records.

m_proposeSplit

protected boolean m_proposeSplit
Flag indicating whether 'Propose Split' has been invoked.
Constructor Detail

AlgoClassifier

public AlgoClassifier()
Method Detail

setParameters

public final void setParameters(float fPurity,
                                int nSetNumber)
Paremeter setting for forward pruning.
 
Parameters:
fPurity - purity threshold
nSetNumber - Threshold for the number of records

setProposeSplitFlag

public final void setProposeSplitFlag(boolean ps)

initPruning

public void initPruning(AlgoClassifier classifier,
                        int nFoldNumber)
Initializes the pruning method which is based on cross-validation.
 
Parameters:
classifier - classifer which was used to build the tree.
nFoldNumber - Number of folds for cross-validation.

initPruning

public void initPruning(AlgoClassifier classifier,
                        Dataset[] validationDataset)
Initializes the pruning method which is based on a separate validation set.
 
Parameters:
classifier - classifer which was used to build the tree.
validationDataset. -

optimizeTree

public Step optimizeTree(Step s)
Prunes the tree which has 'Step s' as a root node.
 
Parameters:
s - Root of the tree.
Returns:
The pruned tree.

optimizeTree

public Step optimizeTree(Step s,
                         JStatusBar bar)
Prunes the tree which has 'Step s' as a root node. The progression during the computation is reflected by the JStatusBar.
 
Parameters:
s - Root of the tree.
bar. -

sortCategories

public abstract int[] sortCategories(CategoryArray cat,
                                     Dataset[] sets,
                                     int nAttributIndex)
Sorts the categories of a categorical attribute. This method is also invoked before the data visualization to determine the order in which the categories are visualized. This category order must be the same as the one which is used to calculate/represent the split points.
 
Parameters:
cat - Array with the indizes of the categories.
sets - Orininal data records.
nAttributIndex - Index of the attribute for which the sorting should be computed.
Returns:
Array with the sorted indizes of the categories.

calculateAttribute

public abstract void calculateAttribute(Node n,
                                        java.util.Vector vAttributes)
Computes the best attribute and the best split points for the given node and the gives set of attributes. If the set of attributes is empty, all attributes are considered.
 
Parameters:
n - Node, for which the split is calculated.
vAttributes - The set of candidate attributes for which the best split is computed.

getName

public abstract java.lang.String getName()
Returns the name of the classifier that appears in the Combobox for selection.

classifyTree

public final void classifyTree(Step s)
Constructs a tree for the data corresponding to 'Step s'.
 
Parameters:
s - Step which becomes the root node of the tree.

classifyLevels

public final void classifyLevels(Step s,
                                 int nLevels)
Constructs a tree with 'nLevels' levels for the data corresponding to 'Step s'.
 
Parameters:
s - Root of the tree.
nLevels - Number of levels.