Rapid Outlier Detection

Fast computation of distance-based outlierness scores via sampling

Mahito Sugiyama, Karsten Borgwardt

Rapid Distance-Based Outlier Detection via Sampling

Summary

An efficient algorithm for outlier detection, which performs sampling once and measures outlierness of each data point by the distance from it to the nearest neighbor in the sample set. This algorithm has the following advantages:

Scalable; the time complexity is linear in the number of data points,
Effective; it is empirically shown to be the most effective on average among existing distance-based outlier detection methods, and
Easy to use; you only need to input the number of samples, and small sample size (default value is 20) is shown to be a good choice.

Code

C implementation: Download code.zip (ZIP, 421 KB)

R package: Download spoutlier.zip (ZIP, 6 KB)

Also available at external page GitHub

Further information and publication

Please see the following paper for detailed information and refer it in your published research.

Rapid Distance-Based Outlier Detection via Sampling

Mahito Sugiyama and Karsten Borgwardt
Advances in Neural Information Processing Systems 26 (NIPS 2013), 467-475
external page Online | ETH Research Collection | Project page including code

Contact: external page Mahito Sugiyama