Detect Outlier (Distances)


Synopsis

Identifies n outliers in the given ExampleSet based on the distance to their k nearest neighbors.


Description

This operator performs a D^k_n Outlier Search according to the outlier detection approach recommended by Ramaswamy, Rastogi and Shim in "Efficient Algorithms for Mining Outliers from Large Data Sets". It is primarily a statistical outlier search based on a distance measure similar to the DB(p,D)-Outlier Search from Knorr and Ng. But it utilizes a distance search through the k-th nearest neighbourhood, so it implements some sort of locality as well.

The method states, that those objects with the largest distance to their k-th nearest neighbours are likely to be outliers respective to the data set, because it can be assumed, that those objects have a more sparse neighbourhood than the average objects. As this effectively provides a simple ranking over all the objects in the data set according to the distance to their k-th nearest neighbours, the user can specify a number of n objects to be the top-n outliers in the data set.

The operator supports cosine, sine or squared distances in addition to the euclidian distance which can be specified by a distance parameter. The Operator takes an example set and passes it on with an boolean top-n D^k outlier status in a new boolean-valued special outlier attribute indicating true (outlier) and false (no outlier).


Input


Output


Parameters


ExampleProcess