Synopsis
This operator encapsulates an iterated bootstrapping sampling with performance evaluation on the remaining examples.
Description
This validation operator performs several bootstrapped samplings (sampling with replacement) on the input set and trains a model on these samples. The remaining samples, i.e. those which were not sampled, build a test set on which the model is evaluated. This process is repeated for the specified number of iterations after which the average performance is calculated.
The basic setup is the same as for the usual cross validation operator. The first inner operator must provide a model and the second a performance vector. Please note that this operator does not regard example weights, i.e. weights specified in a weight column.
This validation operator provides several values which can be logged by means of a ProcessLogOperator. All performance estimation operators of RapidMiner provide access to the average values calculated during the estimation. Since the operator cannot ensure the names of the delivered criteria, the ProcessLog operator can access the values via the generic value names:
- performance: the value for the main criterion calculated by this validation operator
- performance1: the value of the first criterion of the performance vector calculated
- performance2: the value of the second criterion of the performance vector calculated
- performance3: the value of the third criterion of the performance vector calculated
- for the main criterion, also the variance and the standard deviation can be accessed where applicable.
Input
- training: expects: ExampleSet
Output
- model:
- training:
- averagable 1:
- averagable 2:
Parameters
- create complete model: Indicates if a model of the complete data set should be additionally build after estimation.
- number of validations: The number of validations that should be executed.
- sample ratio: This ratio of examples will be sampled (with replacement) in each iteration.
- use weights: If checked, example weights will be used for bootstrapping if such weights are available.
- average performances only: Indicates if only performance vectors should be averaged or all types of averagable result vectors.
- use local random seed: Indicates if a local random seed should be used.
- local random seed: Specifies the local random seed