Yao Qi, Raja Das, et al.
ISSTA 2009
Defining outliers by their distance to neighboring data points has been shown to be an effective non-parametric approach to outlier detection. In recent years, many research efforts have looked at developing fast distance-based outlier detection algorithms. Several of the existing distance-based outlier detection algorithms report log-linear time performance as a function of the number of data points on many real low-dimensional datasets. However, these algorithms are unable to deliver the same level of performance on high-dimensional datasets, since their scaling behavior is exponential in the number of dimensions. In this paper, we present RBRP, a fast algorithm for mining distance-based outliers, particularly targeted at high-dimensional datasets. RBRP scales log-linearly as a function of the number of data points and linearly as a function of the number of dimensions. Our empirical evaluation demonstrates that we outperform the state-of-the-art algorithm, often by an order of magnitude. © 2008 Springer Science+Business Media, LLC.
Yao Qi, Raja Das, et al.
ISSTA 2009
Marshall W. Bern, Howard J. Karloff, et al.
Theoretical Computer Science
Eric Price, David P. Woodruff
FOCS 2011
Daniel M. Bikel, Vittorio Castelli
ACL 2008