Synopsis
A genetic algorithm for feature selection.
Description
A genetic algorithm for feature selection (mutation=switch features on and off, crossover=interchange used features). Selection is done by roulette wheel. Genetic algorithms are general purpose optimization / search algorithms that are suitable in case of no or little problem knowledge.
A genetic algorithm works as follows- * Generate an initial population consisting of
- For all individuals in the population
- Perform mutation, i.e. set used attributes to unused with probability
p_mutation
and vice versa. - Choose two individuals from the population and perform crossover with probability
p_crossover
. The type of crossover can be selected bycrossover_type
. - Perform selection, map all individuals to sections on a roulette wheel whose size is proportional to the individual's fitness and draw
population_size
individuals at random according to their probability. - As long as the fitness improves, go to 2
population_size
individuals. Each attribute is switched on with probability p_initialize
Input
- example set in: expects: ExampleSetMetaData: #examples: = 0; #attributes: 0
- attribute weights in: optional: AttributeWeights
- through 1:
Output
- example set out:
- weights:
- performance:
Parameters
- use exact number of attributes: Determines if only combinations containing this numbers of attributes should be tested.
- restrict maximum: If checked the maximal number of attributes might be restricted. Otherwise all combinations of all number of attributes are generated and tested.
- min number of attributes: Determines the minimum number of features used for the combinations.
- max number of attributes: Determines the maximum number of features used for the combinations.
- exact number of attributes: Determines the exact number of features used for the combinations.
- initialize with input weights: Indicates if this operator should look for attribute weights in the given input and use the input weights of all known attributes as starting point for the optimization.
- population size: Number of individuals per generation.
- maximum number of generations: Number of generations after which to terminate the algorithm.
- use early stopping: Enables early stopping. If unchecked, always the maximum number of generations is performed.
- generations without improval: Stop criterion: Stop after n generations without improval of the performance.
- normalize weights: Indicates if the final weights should be normalized.
- use local random seed: Indicates if a local random seed should be used.
- local random seed: Specifies the local random seed
- show stop dialog: Determines if a dialog with a button should be displayed which stops the run: the best individual is returned.
- user result individual selection: Determines if the user wants to select the final result individual from the last population.
- show population plotter: Determines if the current population should be displayed in performance space.
- plot generations: Update the population plotter in these generations.
- constraint draw range: Determines if the draw range of the population plotter should be constrained between 0 and 1.
- draw dominated points: Determines if only points which are not Pareto dominated should be painted.
- population criteria data file: The path to the file in which the criteria data of the final population should be saved.
- maximal fitness: The optimization will stop if the fitness reaches the defined maximum.
- selection scheme: The selection scheme of this EA.
- tournament size: The fraction of the current population which should be used as tournament members.
- start temperature: The scaling temperature .
- dynamic selection pressure: If set to true the selection pressure is increased to maximum during the complete optimization run.
- keep best individual: If set to true, the best individual of each generations is guaranteed to be selected for the next generation (elitist selection).
- save intermediate weights: Determines if the intermediate best results should be saved.
- intermediate weights generations: Determines if the intermediate best results should be saved. Will be performed every k generations for a specified value of k.
- intermediate weights file: The file into which the intermediate weights will be saved.
- p initialize: Initial probability for an attribute to be switched on.
- p mutation: Probability for an attribute to be changed (-1: 1 / numberOfAtt).
- p crossover: Probability for an individual to be selected for crossover.
- crossover type: Type of the crossover.