Description and Outline
Fitting a model to a collection of observations is one of the quintessential questions in statistics and machine learning.
The typical assumption is that the data was generated by a model of a given type (e.g., a mixture model).
This is a simplifying assumption that is only approximately valid, as
real datasets are typically exposed to some source of contamination.
Hence, any estimator designed for a particular model
must also be robust
in the presence of corrupted/noisy data.
Classical work in robust statistics, starting with the pioneering works of Tukey and Huber in the 1960s,
pinned down the fundamental information-theoretic aspects of high-dimensional
robust estimation. In contrast, until very recently, the computational aspects have been poorly understood.
In particular, even for the basic problem of robustly estimating the mean of a high-dimensional dataset,
all known robust estimators were hard to compute.
Moreover, the accuracy of the known heuristics (e.g., RANSAC)
degrades polynomially as the dimension increases. This state of affairs prompted the following natural question:
Can we reconcile robustness and computational efficiency in high-dimensional estimation?
A recent line of work in theoretical computer science obtained the first computationally
efficient robust estimators for a range of high-dimensional estimation tasks.
In this tutorial, we will survey the key algorithmic ideas in these recent developments.
We will complement these positive results by discussing the fundamental limits of robust estimation.
Finally, we will discuss new directions and opportunities for future work.