TTIC Summer 2018 Workshop: Computational Efficiency and High-Dimensional Robust Statistics

Topic Description

Fitting a model to a collection of observations is one of the quintessential questions in machine learning. The standard assumption is that the data was generated by a model of a given type (e.g., a mixture model). This simplifying assumption is at best only approximately valid, as real datasets are typically exposed to some source of contamination. Hence, any estimator designed for a particular model that is to be used in practice must also be robust in the presence of model misspecification. This is the prototypical goal in robust statistics, a field that took shape in the 1960s with the pioneering works of Tukey and Huber. Until recently, even for the basic problem of robustly estimating the mean of a high-dimensional dataset, all known robust estimators were hard to compute. Moreover, the quality of the common heuristics (e.g., RANSAC) degrades badly as the dimension increases. This state of affairs prompted the following natural question:

Can we reconcile robustness and computational efficiency in high-dimensional estimation?
A recent line of work in computer science has sought to circumvent this apparent computational barrier for a range of natural generative models. Specifically, two concurrent works gave the first provably robust and efficiently computable estimators for several fundamental questions that were previously thought to be computationally intractable. These include robust estimation of mean and covariance in high dimensions, and robust learning of various latent variable models. Since the dissemination of these works, there has been a flurry of research activity on algorithmic robust estimation in a variety of high-dimensional settings. Moreover, the developed ideas have been useful in several practical applications, including finding patterns in genomic datasets and improving the security of machine learning systems.

Goals and Target Audience

The notion of robustness lies at the core of machine learning. The first objective of the workshop will be to introduce the local machine learning community to the new insights and techniques in the exciting area of algorithmic robust statistics. During the last couple of years, several algorithmic techniques have been developed for robust high-dimensional estimation by various groups of researchers. There are intriguing connections between these techniques and the relations between them can only be drawn when presenting all of them together. By bringing a number of experts together, we expect to foster collaboration and open exchange of ideas that will significantly push the boundary of the field. This includes both circumventing technical obstacles and discussing new directions and opportunities for future work. Specifically, we will examine the following concrete questions: What are the limits of current algorithmic techniques, and can we develop new techniques that yield efficient robust estimators for richer families of problems and models? Is there a formal connection between adversarial robustness and other notions of algorithmic stability, such as differential privacy, and adaptive data analysis? Can we leverage insights from the theory of robustness to improve the security of machine learning systems against adversarial attacks?

List of Participants

Pranjal Awasthi (Rutgers)
Sivaraman Balakrishnan (CMU)
Nina Balcan (CMU)
Moses Charikar (Stanford)
Yu Cheng (Duke)
Ilias Diakonikolas (USC)
Chao Gao (Chicago)
Sam Hopkins (Cornell)
Prateek Jain (MSR)
Gautam Kamath (MIT)
Daniel Kane (UCSD)
Ravi Kannan (MSR)
Weihao Kong (Stanford)
Pravesh Kothari (Princeton)
Po-Ling Loh (UW Madison)
Stas Minsker (USC)
Ankur Moitra (MIT)
Anup Rao (Adobe Research)
Sankeerth Rao (UCSD)
Mahdi Soltanokotabi (USC)
Gregory Valiant (Stanford)
Santosh Vempala (GaTech)
David Woodruff (CMU)

Tentative Schedule

Monday

Time	Talk
9:30 - 9:40	Opening Remarks
9:40 - 11:40	Ilias Diakonikolas Algorithmic High-Dimensional Robust Statistics [pdf]
12:00- 12:30	Daniel Kane Good Sets and Sample Complexity [pdf]
12:30- 2:00	Lunch offered at TTI
2:00- 4:00	Pravesh Kothari SoS Method for Robust Mean Estimation [pdf]
4:30- 5:30	David Woodruff Sketching for M-Estimators and Robust Numerical Linear Algebra [pdf]

Tuesday

Time	Talk
9:30 - 10:00	Daniel Kane Robust Covariance Estimation [pdf]
10:15 - 12:00	Sam Hopkins SoS Method for Robust Estimation of Higher Moments [pdf] [board-photo1] [board-photo2]
12:00- 2:00	Lunch (on your own)
2:00-2:45	Daniel Kane Computational Lower Bounds for Robust Estimation [pdf] or [pdf]
3:00- 4:00	Daniel Kane List-Decodable Learning and Learning Mixtures of Distributions [pdf]
4:30- 5:30	Open Problems Session

Wednesday

Time	Talk
9:30 - 10:15	Chao Gao Robust Estimation: Optimal Rates, Adaptation and Computation [pdf]
10:45 - 11:30	Po-Ling Loh Robust estimation via (non)convex M-estimation [pdf]
12:00- 2:00	Lunch (on your own)
2:00-2:45	Stas Minsker Statistical estimation with heavy-tailed data [pdf]
2:45- 5:00	Working groups/Collaboration

Thursday

Time	Talk
9:30 - 10:30	Pranjal Awasthi Malicious PAC Learning of Halfspaces [pdf]
10:45 - 11:15	Ilias Diakonikolas Robust PAC Learning of Geometric Concepts [pdf]
11:30-12:15	Prateek Jain Hard Thresholding method for Robust Regression
12:30- 1:30	Nina Balcan (TTIC Seminar) Foundations of Data Driven Algorithm Design [pdf]
2:00- 5:00	Working groups/Social event

Friday

Time	Talk
9:30 - 10:20	Gautam Kamath Beyond Theory: Realizing Robustness [pdf]
10:20 - 2:00	Working groups and Lunch
2:00-2:30	Weihao Kong Algorithms and Lower Bounds for Robust Linear Regression [pdf]
2:45-3:15	Yu Cheng Faster Algorithms for Robust Estimation [pdf]
3:15 -5:30	Working groups/Open Problems

Workshop on Computational Efficiency and High-Dimensional Robust Statistics
August 13-17, 2018, TTIC, Chicago, IL

Organizers:

Sponsors

Topic Description

Goals and Target Audience

List of Participants

Tentative Schedule

Workshop on Computational Efficiency and High-Dimensional Robust Statistics August 13-17, 2018, TTIC, Chicago, IL

Organizers:

Sponsors

Topic Description

Goals and Target Audience

List of Participants

Tentative Schedule

Workshop on Computational Efficiency and High-Dimensional Robust Statistics
August 13-17, 2018, TTIC, Chicago, IL