Synopsis
Creates new attributes from a nominal attribute by dividing the nominal values into parts according to a split criterion.
Description
This operator creates new attributes from a nominal attribute by dividing the nominal values into parts according to a split criterion (regular expression). This operator provides two different modes, depending on the setting of the parameter "splitting_mode".
Ordered Splits
In the first split mode, called ordered_split, the resulting attributes get the name of the original attribute together with a number indicating the order. For example, if the original data contained the values attribute-name
value1 value2, value3 value3
and should be divided by the separating commas, the resulting attributes would be attribute-name1, attribute-name2, attribute-name3 with the tuples (value1, ?, ?), (value2, value3, ?), and (value3, ?, ?), respectively. This mode is useful if the original values indicated some order like, for example, a preference.
Unordered Splits
In the second split mode, called unordered_split, the resulting attributes get the name of the original attribute together with the value for each of the occurring values. For example, if the original data contained the values attribute-name
value1 value2, value3 value3
and again should be divided by the separating commas, the resulting attributes would be attribute-name-value1, attribute-name-value2, and attribute-name-value3 with the tuples (true, false, false), (false, true, true), and (false, false, true), respectively. This mode is useful if the order is not important but the goal is a basket like data set containing all occurring values.
Input
- example set input: expects: ExampleSetMetaData: #examples: = 0; #attributes: 0
Output
- example set output:
- original:
Parameters
- attribute filter type: The condition specifies which attributes are selected or affected by this operator.
- attribute: The attribute which should be chosen.
- attributes: The attribute which should be chosen.
- regular expression: A regular expression for the names of the attributes which should be kept.
- use except expression: If enabled, an exception to the specified regular expression might be specified. Attributes of matching this will be filtered out, although matching the first expression.
- except regular expression: A regular expression for the names of the attributes which should be filtered out although matching the above regular expression.
- value type: The value type of the attributes.
- use value type exception: If enabled, an exception to the specified value type might be specified. Attributes of this type will be filtered out, although matching the first specified type.
- except value type: Except this value type.
- block type: The block type of the attributes.
- use block type exception: If enabled, an exception to the specified block type might be specified.
- except block type: Except this block type.
- numeric condition: Parameter string for the condition, e.g. '>= 5'
- invert selection: Indicates if only attributes should be accepted which would normally filtered.
- include special attributes: Indicate if this operator should also be applied on the special attributes. Otherwise they are always kept.
- split pattern: The pattern which is used for dividing the nominal values into different parts.
- split mode: The split mode of this operator, either ordered splits (keeping the original order) or unordered (keeping basket-like information).