Synopsis
Identifies outliers in the given ExampleSet based on the data density.
Description
This operator is a DB outlier detection algorithm which calculates the DB(p,D)-outliers for an ExampleSet passed to the operator. DB(p,D)-outliers are Distance based outliers according to Knorr and Ng. A DB(p,D)-outlier is an object to which at least a proportion of p of all objects are farer away than distance D. It implements a global homogenous outlier search.
Currently, the operator supports cosine, sine or squared distances in addition to the usual euclidian distance which can be specified by the corresponding parameter. The operator takes two other real-valued parameters p and D. Depending on these parameters, search objects will be created from the examples in the ExampleSet passed to the operator. These search objects will be added to a search space which will perform the outlier search according to the DB(p,D) scheme.
The Outlier status (boolean in its nature) is written to a new special attribute "Outlier" and is passed on with the example set.
Input
- example set input: expects: ExampleSetMetaData: #examples: = 0; #attributes: 0
Output
- example set output:
- original:
Parameters
- distance: The distance for objects.
- proportion: The proportion of objects related to D.
- distance function: Indicates which distance function will be used for calculating the distance between two objects