Synopsis
Improved version of Yet Another GGA (Generating Geneting Algorithm).
Description
YAGGA is an acronym for Yet Another Generating Genetic Algorithm. Its approach to generating new attributes differs from the original one. The (generating) mutation can do one of the following things with different probabilities:
- Probability p/4: Add a newly generated attribute to the feature vector
- Probability p/4: Add a randomly chosen original attribute to the feature vector
- Probability p/2: Remove a randomly chosen attribute from the feature vector
Thus it is guaranteed that the length of the feature vector can both grow and shrink. On average it will keep its original length, unless longer or shorter individuals prove to have a better fitness.
In addition to the usual YAGGA operator, this operator allows more feature generators and provides several techniques for intron prevention. This leads to smaller example sets containing less redundant features.
Since this operator does not contain algorithms to extract features from value series, it is restricted to example sets with only single attributes. For (automatic) feature extraction from values series the value series plugin for RapidMiner should be used.
For more information please refer to
Mierswa, Ingo (2007): RobustGP: Intron-Free Multi-Objective Feature Construction (to appear)
Input
- example set in: expects: ExampleSet
Output
- example set out:
- attribute weights out:
- performance out:
Parameters
- limit max total number of attributes: Indicates if the total number of attributes in all generations should be limited.
- max total number of attributes: Max total number of attributes in all generations.
- use local random seed: Indicates if a local random seed should be used.
- local random seed: Specifies the local random seed
- show stop dialog: Determines if a dialog with a button should be displayed which stops the run: the best individual is returned.
- maximal fitness: The optimization will stop if the fitness reaches the defined maximum.
- population size: Number of individuals per generation.
- maximum number of generations: Number of generations after which to terminate the algorithm.
- use plus: Generate sums.
- use diff: Generate differences.
- use mult: Generate products.
- use div: Generate quotients.
- reciprocal value: Generate reciprocal values.
- use early stopping: Enables early stopping. If unchecked, always the maximum number of generations is performed.
- generations without improval: Stop criterion: Stop after n generations without improval of the performance.
- tournament size: The fraction of the current population which should be used as tournament members (only tournament selection).
- start temperature: The scaling temperature (only Boltzmann selection).
- dynamic selection pressure: If set to true the selection pressure is increased to maximum during the complete optimization run (only Boltzmann and tournament selection).
- keep best individual: If set to true, the best individual of each generations is guaranteed to be selected for the next generation (elitist selection).
- p initialize: Initial probability for an attribute to be switched on.
- p crossover: Probability for an individual to be selected for crossover.
- crossover type: Type of the crossover.
- use heuristic mutation probability: If checked the probability for mutations will be chosen as 1/number of attributes.
- p mutation: Probability for mutation.
- use square roots: Generate square root values.
- use power functions: Generate the power of one attribute and another.
- use sin: Generate sinus.
- use cos: Generate cosinus.
- use tan: Generate tangens.
- use atan: Generate arc tangens.
- use exp: Generate exponential functions.
- use log: Generate logarithmic functions.
- use absolute values: Generate absolute values.
- use min: Generate minimum values.
- use max: Generate maximum values.
- use sgn: Generate signum values.
- use floor ceil functions: Generate floor, ceil, and rounded values.
- restrictive selection: Use restrictive generator selection (faster).
- remove useless: Remove useless attributes.
- remove equivalent: Remove equivalent attributes.
- equivalence samples: Check this number of samples to prove equivalency.
- equivalence epsilon: Consider two attributes equivalent if their difference is not bigger than epsilon.
- equivalence use statistics: Recalculates attribute statistics before equivalence check.
- unused functions: Space separated list of functions which are not allowed in arguments for attribute construction.
- constant generation prob: Generate random constant attributes with this probability.
- associative attribute merging: Post processing after crossover (only possible for runs with only one generator).