Foundations of Data Science Bootcamp Tutorial: Algorithmic High-Dimensional Robust Statistics

Abstract

Fitting a model to a collection of observations is one of the quintessential problems in statistics. The typical assumption is that the data was generated by a model of a given type (e.g., a mixture model). This is a simplifying assumption that is only approximately valid, as real datasets are typically exposed to some source of contamination. Hence, any estimator designed for a particular model must also be robust in the presence of corrupted data. Until recently, even for the most basic problem of robustly estimating the mean of a high-dimensional dataset, all known robust estimators were hard to compute.

A recent line of work in theoretical computer science obtained the first computationally efficient robust estimators for a range of high-dimensional estimation tasks. In this tutorial talk, we will survey the algorithmic techniques underlying these estimators and the connections between them. We will illustrate these techniques for the problems of robust mean and covariance estimation. Finally, we will discuss new directions and opportunities for future work.

Slides

[ppt] [pdf]

Video

[lecture_1] [lecture_2]

Key References

Robust Estimators in High Dimensions without the Computational Intractability

Agnostic Estimation of Mean and Covariance

Being Robust (in High Dimensions) Can Be Practical

Statistical Query Lower Bounds for Robust Estimation of High-dimensional Gaussians and Gaussian Mixtures

Robustly Learning a Gaussian: Getting Optimal Error, Efficiently

Tutorial: Algorithmic High-Dimensional Robust Statistics

Location: Simons Institute for the Theory of Computing, August 30, 2018

Presenter: Ilias Diakonikolas

Abstract

Slides

Video

Key References