Synopsis
Generates decision trees to classify nominal data.
Description
This operator learns decision trees from both nominal and numerical data. Decision trees are powerful classification methods which often can also easily be understood. In order to classify an example, the tree is traversed bottom-down. Every node in a decision tree is labelled with an attribute. The example's value for this attribute determines which of the outcoming edges is taken. For nominal attributes, we have one outgoing edge per possible attribute value, and for numerical attribtues the outgoing edges are labelled with disjoint ranges.
This decision tree learner works similar to Quinlan's C4.5 or CART. Roughly speaking, the tree induction algorithm works as follows. Whenever a new node is created at a certain stage, an attribute is picked to maximise the discriminative power of that node with respect to the examples assigned to the particular subtree. This discriminative power is measured by a criterion which can be selected by the user (information gain, gain ratio, gini index, etc.).
The algorithm stops in various cases:
- No attribute reaches a certain threshold (minimum_gain).
- The maximal depth is reached.
- There are less than a certain number of examples (minimal_size_for_split) in the current subtree.
Finally, the tree is pruned, i.e. leaves that do not add to the discriminative power of the whole tree are removed.
Input
- training set: expects: ExampleSet
Output
- model:
- exampleSet:
Parameters
- criterion: Specifies the used criterion for selecting attributes and numerical splits.
- minimal size for split: The minimal size of a node in order to allow a split.
- minimal leaf size: The minimal size of all leaves.
- minimal gain: The minimal gain which must be achieved in order to produce a split.
- maximal depth: The maximum tree depth (-1: no bound)
- confidence: The confidence level used for the pessimistic error calculation of pruning.
- number of prepruning alternatives: The number of alternative nodes tried when prepruning would prevent a split.
- no pre pruning: Disables the pre pruning and delivers a tree without any prepruning.
- no pruning: Disables the pruning and delivers an unpruned tree.