Fitting a model to a collection of observations is one of the quintessential questions in machine learning.
The standard assumption is that the data was generated by a model of a given type (e.g., a mixture model).
This simplifying assumption is at best only approximately valid, as
real datasets are typically exposed to some source of contamination.
Hence, any estimator designed for a particular model that is to be used in practice
must also be robust
in the presence of model misspecification. This is the prototypical goal in
, a field that took shape in the 1960s
with the pioneering works of Tukey and Huber.
Until recently, even for the basic problem
of robustly estimating the mean of a high-dimensional dataset,
all known robust estimators were hard to compute.
Moreover, the quality of the common heuristics (e.g., RANSAC) degrades badly as the dimension increases.
This state of affairs prompted the following natural question:
Can we reconcile robustness and computational efficiency in high-dimensional estimation?
A recent line of work in computer science has sought to circumvent this apparent computational
barrier for a range of natural generative models. Specifically, two concurrent works gave the first provably robust and
efficiently computable estimators for several fundamental questions that were previously thought to be computationally intractable.
These include robust estimation of mean and covariance in high dimensions, and robust learning of various latent variable models.
Since the dissemination of these works, there has been a flurry of research activity on algorithmic robust estimation
in a variety of high-dimensional settings. Moreover, the developed ideas have been useful in
several practical applications, including finding patterns in genomic datasets
and improving the security of machine learning systems.
Goals and Target Audience
The notion of robustness lies at the core of machine learning.
The first objective of the workshop will be to introduce the local machine learning community
to the new insights and techniques in the exciting area of algorithmic robust statistics.
During the last couple of years, several algorithmic techniques
have been developed for robust high-dimensional estimation by various groups
of researchers. There are intriguing connections between these techniques
and the relations between them can only be drawn when presenting all of them together.
By bringing a number of experts together, we expect to foster collaboration
and open exchange of ideas that will significantly push the boundary of the field.
This includes both circumventing technical obstacles and discussing new directions
and opportunities for future work.
Specifically, we will examine the following concrete questions:
What are the limits of current algorithmic techniques, and
can we develop new techniques that yield efficient robust estimators
for richer families of problems and models? Is there a formal connection between adversarial
robustness and other notions of algorithmic stability, such as differential privacy, and adaptive data analysis?
Can we leverage insights from the theory of robustness to improve the security
of machine learning systems against adversarial attacks?
List of Participants
Pranjal Awasthi (Rutgers)
Sivaraman Balakrishnan (CMU)
Nina Balcan (CMU)
Moses Charikar (Stanford)
Yu Cheng (Duke)
Ilias Diakonikolas (USC)
Chao Gao (Chicago)
Sam Hopkins (Cornell)
Prateek Jain (MSR)
Gautam Kamath (MIT)
Daniel Kane (UCSD)
Ravi Kannan (MSR)
Weihao Kong (Stanford)
Pravesh Kothari (Princeton)
Po-Ling Loh (UW Madison)
Stas Minsker (USC)
Ankur Moitra (MIT)
Anup Rao (Adobe Research)
Sankeerth Rao (UCSD)
Mahdi Soltanokotabi (USC)
Gregory Valiant (Stanford)
Santosh Vempala (GaTech)
David Woodruff (CMU)
||9:30 - 9:40 || Opening Remarks|
|9:40 - 11:40 || Ilias Diakonikolas|
Algorithmic High-Dimensional Robust Statistics
|12:00- 12:30 ||Daniel Kane |
Good Sets and Sample Complexity
|12:30- 2:00 || Lunch offered at TTI|
|2:00- 4:00 ||Pravesh Kothari |
SoS Method for Robust Mean Estimation
|4:30- 5:30 ||David Woodruff
Sketching for M-Estimators and Robust Numerical Linear Algebra
||9:30 - 10:00 || Daniel Kane|
Robust Covariance Estimation
|10:15 - 12:00 ||Sam Hopkins |
SoS Method for Robust Estimation of Higher Moments
|12:00- 2:00 || Lunch (on your own)|
|2:00-2:45 ||Daniel Kane|
Computational Lower Bounds for Robust Estimation
|3:00- 4:00 ||Daniel Kane
List-Decodable Learning and Learning Mixtures of Distributions
|4:30- 5:30 ||
Open Problems Session
||9:30 - 10:15 ||Chao Gao
Robust Estimation: Optimal Rates, Adaptation and Computation
|10:45 - 11:30 || Po-Ling Loh|
Robust estimation via (non)convex M-estimation
|12:00- 2:00 || Lunch (on your own)|
|2:00-2:45 ||Stas Minsker|
Statistical estimation with heavy-tailed data
|2:45- 5:00 ||
||9:30 - 10:30 ||Pranjal Awasthi
Malicious PAC Learning of Halfspaces
|10:45 - 11:15 || Ilias Diakonikolas|
Robust PAC Learning of Geometric Concepts
|11:30-12:15 ||Prateek Jain|
Hard Thresholding method for Robust Regression
|12:30- 1:30 || Nina Balcan (TTIC Seminar)|
Foundations of Data Driven Algorithm Design
|2:00- 5:00 |
Working groups/Social event
||9:30 - 10:20 ||Gautam Kamath
Beyond Theory: Realizing Robustness
|10:20 - 2:00 |
Working groups and Lunch
|2:00-2:30 ||Weihao Kong|
Algorithms and Lower Bounds for Robust Linear Regression
|2:45-3:15 ||Yu Cheng
Faster Algorithms for Robust Estimation
|3:15 -5:30 ||
Working groups/Open Problems