Synopsis
Discretizes numerical attributes. Bin boundaries are chosen as to minimize the entropy in the induced partitions.
Description
This operator discretizes all numeric attributes in the dataset into nominal attributes. The discretization is performed by selecting a bin boundary minimizing the entropy in the induced partitions. The method is then applied recursively for both new partitions until the stopping criterion is reached. For Details see a) Multi-interval discretization of continued-values attributes for classification learning (Fayyad,Irani) and b) Supervised and Unsupervised Discretization (Dougherty,Kohavi,Sahami). Skips all special attributes including the label.
Please note that this operator automatically removes all attributes with only one range (i.e. those attributes which are not actually discretized since the entropy criterion is not fulfilled). This behavior can be controlled by the remove_useless parameter.
Input
- example set input: expects: ExampleSetMetaData: #examples: = 0; #attributes: 0
, Example set matching at least one selected attribute.
Output
- example set output:
- original:
- preprocessing model:
Parameters
- return preprocessing model: Indicates if the preprocessing model should also be returned
- create view: Create View to apply preprocessing instead of changing the data
- attribute filter type: The condition specifies which attributes are selected or affected by this operator.
- attribute: The attribute which should be chosen.
- attributes: The attribute which should be chosen.
- regular expression: A regular expression for the names of the attributes which should be kept.
- use except expression: If enabled, an exception to the specified regular expression might be specified. Attributes of matching this will be filtered out, although matching the first expression.
- except regular expression: A regular expression for the names of the attributes which should be filtered out although matching the above regular expression.
- value type: The value type of the attributes.
- use value type exception: If enabled, an exception to the specified value type might be specified. Attributes of this type will be filtered out, although matching the first specified type.
- except value type: Except this value type.
- block type: The block type of the attributes.
- use block type exception: If enabled, an exception to the specified block type might be specified.
- except block type: Except this block type.
- numeric condition: Parameter string for the condition, e.g. '>= 5'
- invert selection: Indicates if only attributes should be accepted which would normally filtered.
- include special attributes: Indicate if this operator should also be applied on the special attributes. Otherwise they are always kept.
- remove useless: Indicates if useless attributes, i.e. those containing only one single range, should be removed.
- range name type: Indicates if long range names including the limits should be used.
- automatic number of digits: Indicates if the number of digits should be automatically determined for the range names.
- number of digits: The minimum number of digits used for the interval names (-1: determine minimal number automatically).