Rapid Outlier Detection
Fast computation of distance-based outlierness scores via sampling
Summary
An efficient algorithm for outlier detection, which performs sampling once and measures outlierness of each data point by the distance from it to the nearest neighbor in the sample set. This algorithm has the following advantages:
- Scalable; the time complexity is linear in the number of data points,
- Effective; it is empirically shown to be the most effective on average among existing distance-based outlier detection methods, and
- Easy to use; you only need to input the number of samples, and small sample size (default value is 20) is shown to be a good choice.
Code
C implementation: Download code.zip (ZIP, 421 KB)
R package: Download spoutlier.zip (ZIP, 6 KB)
Also available at external page GitHub
Further information and publication
Please see the following paper for detailed information and refer it in your published research.
Rapid Distance-Based Outlier Detection via Sampling
Mahito Sugiyama and Karsten Borgwardt
Advances in Neural Information Processing Systems 26 (NIPS 2013), 467-475
external page Online | ETH Research Collection | Project page including code
Contact: external page Mahito Sugiyama