Synopsis
This learner efficiently calculates all frequent item sets from the given data.
Description
This operator calculates all frequent items sets from a data set by building a FPTree data structure on the transaction data base. This is a very compressed copy of the data which in many cases fits into main memory even for large data bases. From this FPTree all frequent item set are derived. A major advantage of FPGrowth compared to Apriori is that it uses only 2 data scans and is therefore often applicable even on large data sets.
Please note that the given data set is only allowed to contain binominal attributes, i.e. nominal attributes with only two different values. Simply use the provided preprocessing operators in order to transform your data set. The necessary operators are the discretization operators for changing the value types of numerical attributes to nominal and the operator Nominal2Binominal for transforming nominal attributes into binominal / binary ones.
The frequent item sets are mined for the positive entries in your data base, i.e. for those nominal values which are defined as positive in your data base. If you use an attribute description file (.aml) for the ExampleSource operator this corresponds to the second value which is defined via the classes attribute or inner value tags.
If your data does not specify the positive entries correctly, you may set them using the parameter positive_value. This only works if all your attributes contain this value!
This operator has two basic working modes: finding at least the specified number of item sets with highest support without taking the min_support into account (default) or finding all item sets with a support large than min_support.
Input
- example set: expects: ExampleSet
Output
- example set:
- frequent sets:
Parameters
- find min number of itemsets: Indicates if the mininmal support should be decreased automatically until the specified minimum number of frequent item sets is found. The defined minimal support is lowered by 20 percent each time.
- min number of itemsets: Indicates the minimum number of itemsets which should be determined if the corresponding parameter is activated.
- max number of retries: This determines how many times the operator lowers min support to find the minimal number of item sets. Each time the minimal support is lowered by 20 percent.
- positive value: This parameter determines, which value of the binominal attributes is treated as positive. Attributes with that value are considered as part of a transaction. If left blank, the example set determines, which is value is used.
- min support: The minimal support necessary in order to be a frequent item (set).
- max items: The upper bound for the length of the item sets (-1: no upper bound)
- must contain: The items any generated rule must contain as regular expression. Empty if none.
- keep example set: indicates if example set is kept