Synopsis
Identifies outliers in the given ExampleSet based on Class outlier factors.
Description
This operator performs a Class Outlier Factor (COF) search. COF outliers or Class Outliers method search for observations (objects) those that arouse suspicions, taking into account the class labels according to the definition of Class Outlier by Hewaihi and Saad in "A comparative Study of Outlier Mining and Class Outlier Mining", CS Letters, Vol 1, No 1 (2009)", and "Class Outliers Mining: Distance-Based Approach", International Journal of Intelligent Systems and Technologies, Vol. 2, No. 1, pp 55-68, 2007".
It detects rare / exceptional / suspicious cases with respect to a group of similar cases.
The main concept of ECODB (Enhanced Class Outlier - Distance Based) algorithm is to rank each instance in the dataset D given the parameters N (top N class outliers), and K (the number of nearest neighbors. The Rank finds out the rank of each instance using the formula (COF = PCL(T,K) - norm(deviation(T)) + norm(kDist(T))). where PCL(T,K) is the Probability of the class label of the instance T with respect to the class labels of its K Nearest Neighbors. and norm(Deviation(T)) and norm(KDist(T)) are the normalized value of Deviation(T) and KDist(T) respectively and their value fall into the range [0 - 1]. Deviation(T) is how much the instance T deviates from instances of the same class, and computed by summing the distances between the instance T and every instance belong to the same class of the instance. KDist(T) is the summation of distance between the instance T and its K nearest neighbors.
The ECODB algorithm maintains a list of only the instances of the top N class outliers. The less is the value of COF of an instance, the higher is the priority of the instance to be a class outlier.
The operator supports mixed euclidian distance. The Operator takes an example set and passes it on with an boolean top-n COF outlier status in a new boolean-valued special outlier attribute indicating true (outlier) and false (no outlier), and another special attribute "COF Factor" which measures the degree of being Class Outlier for an object.
Input
- example set input: expects: ExampleSetMetaData: #examples: = 0; #attributes: 0
Output
- example set output:
- original:
Parameters
- number of neighbors: Specifies the k value for the k-th nearest neighbours to be the analyzed. (default value is 10, minimum 1 and max is set to 1 million)
- number of class outliers: The number of top-n Class Outliers to be looked for.(default value is 10, minimum 2 (internal reasons) and max is set to 1 million)
- measure types: The measure type
- mixed measure: Select measure
- nominal measure: Select measure
- numerical measure: Select measure
- divergence: Select divergence
- kernel type: The kernel type
- kernel gamma: The kernel parameter gamma.
- kernel sigma1: The kernel parameter sigma1.
- kernel sigma2: The kernel parameter sigma2.
- kernel sigma3: The kernel parameter sigma3.
- kernel degree: The kernel parameter degree.
- kernel shift: The kernel parameter shift.
- kernel a: The kernel parameter a.
- kernel b: The kernel parameter b.