Workshop on Computational Efficiency and High-Dimensional Robust Statistics
August 13-17, 2018, TTIC, Chicago, IL

Organizers:

Sponsors

TTIC generously provided support for this workshop.

Topic Description

Fitting a model to a collection of observations is one of the quintessential questions in machine learning. The standard assumption is that the data was generated by a model of a given type (e.g., a mixture model). This simplifying assumption is at best only approximately valid, as real datasets are typically exposed to some source of contamination. Hence, any estimator designed for a particular model that is to be used in practice must also be robust in the presence of model misspecification. This is the prototypical goal in robust statistics, a field that took shape in the 1960s with the pioneering works of Tukey and Huber. Until recently, even for the basic problem of robustly estimating the mean of a high-dimensional dataset, all known robust estimators were hard to compute. Moreover, the quality of the common heuristics (e.g., RANSAC) degrades badly as the dimension increases. This state of affairs prompted the following natural question:

Can we reconcile robustness and computational efficiency in high-dimensional estimation?

A recent line of work in computer science has sought to circumvent this apparent computational barrier for a range of natural generative models. Specifically, two concurrent works gave the first provably robust and efficiently computable estimators for several fundamental questions that were previously thought to be computationally intractable. These include robust estimation of mean and covariance in high dimensions, and robust learning of various latent variable models. Since the dissemination of these works, there has been a flurry of research activity on algorithmic robust estimation in a variety of high-dimensional settings. Moreover, the developed ideas have been useful in several practical applications, including finding patterns in genomic datasets and improving the security of machine learning systems.

Goals and Target Audience

The notion of robustness lies at the core of machine learning. The first objective of the workshop will be to introduce the local machine learning community to the new insights and techniques in the exciting area of algorithmic robust statistics. During the last couple of years, several algorithmic techniques have been developed for robust high-dimensional estimation by various groups of researchers. There are intriguing connections between these techniques and the relations between them can only be drawn when presenting all of them together. By bringing a number of experts together, we expect to foster collaboration and open exchange of ideas that will significantly push the boundary of the field. This includes both circumventing technical obstacles and discussing new directions and opportunities for future work. Specifically, we will examine the following concrete questions: What are the limits of current algorithmic techniques, and can we develop new techniques that yield efficient robust estimators for richer families of problems and models? Is there a formal connection between adversarial robustness and other notions of algorithmic stability, such as differential privacy, and adaptive data analysis? Can we leverage insights from the theory of robustness to improve the security of machine learning systems against adversarial attacks?

List of Participants

Pranjal Awasthi (Rutgers)
Sivaraman Balakrishnan (CMU)
Nina Balcan (CMU)
Moses Charikar (Stanford)
Yu Cheng (Duke)
Ilias Diakonikolas (USC)
Chao Gao (Chicago)
Sam Hopkins (Cornell)
Prateek Jain (MSR)
Gautam Kamath (MIT)
Daniel Kane (UCSD)
Ravi Kannan (MSR)
Weihao Kong (Stanford)
Pravesh Kothari (Princeton)
Po-Ling Loh (UW Madison)
Stas Minsker (USC)
Ankur Moitra (MIT)
Anup Rao (Adobe Research)
Sankeerth Rao (UCSD)
Mahdi Soltanokotabi (USC)
Gregory Valiant (Stanford)
Santosh Vempala (GaTech)
David Woodruff (CMU)

Tentative Schedule

Monday

Time Talk
9:30 - 9:40 Opening Remarks
9:40 - 11:40 Ilias Diakonikolas
Algorithmic High-Dimensional Robust Statistics [pdf]
12:00- 12:30 Daniel Kane
Good Sets and Sample Complexity [pdf]
12:30- 2:00 Lunch offered at TTI
2:00- 4:00 Pravesh Kothari
SoS Method for Robust Mean Estimation [pdf]
4:30- 5:30 David Woodruff
Sketching for M-Estimators and Robust Numerical Linear Algebra [pdf]


Tuesday

Time Talk
9:30 - 10:00 Daniel Kane
Robust Covariance Estimation [pdf]
10:15 - 12:00 Sam Hopkins
SoS Method for Robust Estimation of Higher Moments [pdf] [board-photo1] [board-photo2]
12:00- 2:00 Lunch (on your own)
2:00-2:45 Daniel Kane
Computational Lower Bounds for Robust Estimation [pdf] or [pdf]
3:00- 4:00 Daniel Kane
List-Decodable Learning and Learning Mixtures of Distributions [pdf]
4:30- 5:30
Open Problems Session


Wednesday

Time Talk
9:30 - 10:15 Chao Gao
Robust Estimation: Optimal Rates, Adaptation and Computation [pdf]
10:45 - 11:30 Po-Ling Loh
Robust estimation via (non)convex M-estimation [pdf]
12:00- 2:00 Lunch (on your own)
2:00-2:45 Stas Minsker
Statistical estimation with heavy-tailed data [pdf]
2:45- 5:00
Working groups/Collaboration


Thursday

Time Talk
9:30 - 10:30 Pranjal Awasthi
Malicious PAC Learning of Halfspaces [pdf]
10:45 - 11:15 Ilias Diakonikolas
Robust PAC Learning of Geometric Concepts [pdf]
11:30-12:15 Prateek Jain
Hard Thresholding method for Robust Regression
12:30- 1:30 Nina Balcan (TTIC Seminar)
Foundations of Data Driven Algorithm Design [pdf]
2:00- 5:00
Working groups/Social event


Friday

Time Talk
9:30 - 10:20 Gautam Kamath
Beyond Theory: Realizing Robustness [pdf]
10:20 - 2:00
Working groups and Lunch
2:00-2:30 Weihao Kong
Algorithms and Lower Bounds for Robust Linear Regression [pdf]
2:45-3:15 Yu Cheng
Faster Algorithms for Robust Estimation [pdf]
3:15 -5:30
Working groups/Open Problems