Synopsis
A highly efficient implementation of the forward selection scheme.
Description
This operator starts with an empty selection of attributes and, in each round, it adds each unused attribute of the given set of examples. For each added attribute, the performance is estimated using inner operators, e.g. a cross-validation. Only the attribute giving the highest increase of performance is added to the selection. Then a new round is started with the modified selection. This implementation will avoid any additional memory consumption beside the memory used originally for storing the data and the memory which might be needed for applying the inner operators.
A parameter specifies when the iteration will be aborted. There are three different behaviors possible:
- without increase
- without increase of at least
- without significant increase
The parameter speculative_rounds defines how many rounds will be performed in a row, after a first time the stopping criterion was fulfilled. If the performance increases again during the speculative rounds, the selection will be continued. Otherwise all additionally selected attributes will be removed, as if no speculative rounds would have been executed. This might help to avoid getting stuck in local optima. A following backward elimination operator might remove unneeded attributes again.
The operator provides a value for logging the performance in each round using a Log.
Input
- example set: expects: ExampleSet
Output
- example set:
- attribute weights:
- performance:
Parameters
- maximal number of attributes: The maximal number of forward selection steps and hence the maximal number of attributes.
- speculative rounds: Defines the number of times, the stopping criterion might be consecutivly ignored before the selection is actually stopped. A number higher than one might help not to stack in the local optima.
- stopping behavior: Defines on what criterias the selection is stopped.
- use relative increase: If checked, the relative performance increase will be used as stopping criterion.
- minimal absolute increase: If the absolut performance increase to the last step drops below this threshold, the selection will be stopped.
- minimal relative increase: If the relative performance increase to the last step drops below this threshold, the selection will be stopped.
- alpha: The probability threshold which determines if differences are considered as significant.