Synopsis
A batched cross-validation in order to estimate the performance of a learning operator according to predefined example batches.
Description
BatchXValidation
encapsulates a cross-validation process. The example set S is split up into number_of_validations subsets S_i. The inner operators are applied number_of_validations times using S_i as the test set (input of the second inner operator) and Sbackslash S_i training set (input of the first inner operator).
In contrast to the usual cross validation operator (see XValidation) this operator does not (randomly) split the data itself but uses the partition defined by the special attribute "batch". This can be an arbitrary nominal or integer attribute where each possible value occurs at least once (since many learning schemes depend on this minimum number of examples).
The first inner operator must accept an ExampleSet while the second must accept an ExampleSet and the output of the first (which is in most cases a Model) and must produce a PerformanceVector.
The cross validation operator provides several values which can be logged by means of a ProcessLogOperator. Of course the number of the current iteration can be logged which might be useful for ProcessLog operators wrapped inside a cross validation. Beside that, all performance estimation operators of RapidMiner provide access to the average values calculated during the estimation. Since the operator cannot ensure the names of the delivered criteria, the ProcessLog operator can access the values via the generic value names:
- performance: the value for the main criterion calculated by this validation operator
- performance1: the value of the first criterion of the performance vector calculated
- performance2: the value of the second criterion of the performance vector calculated
- performance3: the value of the third criterion of the performance vector calculated
- for the main criterion, also the variance and the standard deviation can be accessed where applicable.
Input
- training: expects: ExampleSet
Output
- model:
- training:
- averagable 1:
- averagable 2:
Parameters
- create complete model: Indicates if a model of the complete data set should be additionally build after estimation.
- average performances only: Indicates if only performance vectors should be averaged or all types of averagable result vectors