Abstract: Outliers are objects that do not comply with the general behavior or model of the data. Applications in astronomy need fast tools for outlier detection in data sets that have unknown distributions, are large in size, and are in high dimensional space. Existing algorithms for outlier detection are too slow for such applications. We present an algorithm based on an innovative use of k-d trees that doesn't assume any probability model and is linear in the number of objects and in the number of dimensions. We also provide experimental results that show that this is indeed a practical solution to the above problem.
This talk is based on work done jointly with Alex Szalay and Andrew Moore.