Synopsis
A SimpleValidation randomly splits up the example set into a training and test set and evaluates the model.
Description
A RandomSplitValidationChain
splits up the example set into a training and test set and evaluates the model. The first inner operator must accept an ExampleSet while the second must accept an ExampleSet and the output of the first (which is in most cases a Model) and must produce a PerformanceVector.
This validation operator provides several values which can be logged by means of a ProcessLogOperator. All performance estimation operators of RapidMiner provide access to the average values calculated during the estimation. Since the operator cannot ensure the names of the delivered criteria, the ProcessLog operator can access the values via the generic value names:
- performance: the value for the main criterion calculated by this validation operator
- performance1: the value of the first criterion of the performance vector calculated
- performance2: the value of the second criterion of the performance vector calculated
- performance3: the value of the third criterion of the performance vector calculated
- for the main criterion, also the variance and the standard deviation can be accessed where applicable.
Input
- training: expects: ExampleSet
Output
- model:
- training:
- averagable 1:
- averagable 2:
Parameters
- create complete model: Indicates if a model of the complete data set should be additionally build after estimation.
- split: Specifies how the example set should be splitted.
- split ratio: Relative size of the training set
- training set size: Absolute size required for the training set (-1: use rest for training)
- test set size: Absolute size required for the test set (-1: use rest for testing)
- sampling type: Defines the sampling type of the cross validation (linear = consecutive subsets, shuffled = random subsets, stratified = random subsets with class distribution kept constant)
- use local random seed: Indicates if a local random seed should be used.
- local random seed: Specifies the local random seed