Introduction

The main focus of my research is on the algorithmic foundations of machine learning and statistics. I seek to understand when and how we can design efficient, reliable, and provably robust algorithms for extracting information from complex data. My work spans several areas—including algorithmic robust statistics, supervised learning, foundations of deep learning, statistical-computational tradeoffs, distribution testing, learning structured distributions, and applied probability—unified by a common goal: to bridge the gap between statistical and computational efficiency. I am broadly interested in developing methods that achieve strong theoretical guarantees while informing the design of practical, trustworthy learning systems.

This page offers an overview of my research contributions over the past fifteen years across several interrelated areas. Each section begins with a brief summary capturing the main ideas and motivation, followed by a more detailed discussion that outlines the key results and representative papers.

Algorithmic Robust Statistics

Brief Overview

Real-world datasets are rarely clean: outliers, corruptions, and systematic noise often compromise the reliability of standard statistical methods. The key challenge is to build algorithms that continue to perform accurately and efficiently, even when part of the data has been distorted or adversarially corrupted.
My work established the field of algorithmic robust statistics, which studies how to design algorithms that are both statistically and computationally efficient in the presence of data corruption. In a 2016 paper, we developed the first efficient methods for robust mean and covariance estimation in high dimensions that nearly match classical information-theoretic guarantees, resolving longstanding open problems.

Subsequent work broadened this framework to a range of settings—including mixture and graphical models and supervised learning—while generalizing and refining the algorithmic framework introduced in 2016 and uncovering intriguing connections to geometry, optimization, and computational complexity. Several of these methods, particularly those based on the iterative filtering technique, have been implemented and evaluated on real datasets, achieving state-of-the-art performance in practical tasks such as defenses against data poisoning and backdoor attacks, automatic outlier removal in genetic datasets, and out-of-distribution detection.

More broadly, robust statistics has become a lens for studying algorithmic problems that at first appear unrelated to data corruptions. Specifically, it has inspired the development of new algorithmic techniques in other areas, including in learning mixture models and natural classes of neural networks.

Highlights
Basic Results and Survey of the Field

The following work gave the first computationally efficient algorithms for robust estimation in high dimensions, settling a fundamental problem in Robust Statistics that had remained open since the early 1960s.

Robust Estimators in High Dimensions without the Computational Intractability arXiv
I. Diakonikolas, G. Kamath, D. Kane, J. Li, A. Moitra, A. Stewart
FOCS 2016, SICOMP 2019

At the technical level, the paper introduced a general methodology for designing provably robust efficient estimators that has been applied to a wide range of settings.
A brief expository article based on the above work appeared in the research highlights section of the Communications of the ACM:

Robustness Meets Algorithms PDF
I. Diakonikolas, G. Kamath, D. Kane, J. Li, A. Moitra, A. Stewart
Communications of the ACM, Research Highlights, 2021

The focus of my subsequent work in this area has been to develop a general algorithmic theory of robustness. A detailed exposition of subsequent developments can be found in my book on the topic, which is a good entry point to the field:

Algorithmic High-Dimensional Robust Statistics PDF
I. Diakonikolas, D. Kane
Cambridge University Press, 2023

The following sections provide highlights of my subsequent work in this area, organized into four main research threads.

Robust Algorithms for Complex Settings

After the 2016 work, a natural direction was to go beyond mean and covariance estimation by developing robust algorithms for more complex settings. These include robust supervised learning (both regression and classification), robustly learning various mixture models, and learning with a majority of outliers.

Learning Geometric Concepts with Nasty Noise arXiv
I. Diakonikolas, D. Kane, A. Stewart
STOC 2018
Outlier-robust PAC learning for polynomial threshold functions and intersections of halfspaces; introduces iterative polynomial filtering in robust supervised learning.
List-Decodable Mean Estimation and Learning Mixtures of Spherical Gaussians arXiv
I. Diakonikolas, D. Kane, A. Stewart
STOC 2018
Optimal list-decodable mean estimation of spherical Gaussians and, as a corollary, learning mixtures of spherical Gaussians with near-optimal separation. Introduces iterative multi-filtering technique.
Efficient Algorithms and Lower Bounds for Robust Linear Regression arXiv
I. Diakonikolas, W. Kong, A. Stewart
SODA 2019
First efficient outlier-robust estimators for Gaussian linear regression.
Robustly Learning Any Clusterable Mixture of Gaussians arXiv
I. Diakonikolas, S. Hopkins, D. Kane, S. Karmalkar
FOCS 2020
Outlier-robust learner for mixtures of Gaussians separated in total variation distance.
Robustly Learning Mixtures of $k$ Arbitrary Gaussians arXiv
A. Bakshi, I. Diakonikolas, H. Jia, D. Kane, P. Kothari, S. Vempala
STOC 2022
First efficient outlier-robust learner for mixtures of arbitrary Gaussians.
Near-Linear Time Algorithms

Once polynomial-time solutions for basic robust estimation tasks were known, a natural next step was to design faster and simpler methods. I initiated this direction by giving the first near-linear-time algorithm for high-dimensional robust mean estimation. A subsequent work crystallized a connection to continuous (non-convex) optimization, showing that robust mean estimation can be solved via first-order methods (e.g., gradient descent) applied to an appropriate non-convex objective. Building on these ideas, we obtained almost-linear-time algorithms for learning separated mixture models (including sub-Gaussian and bounded-covariance families), giving the first runtime improvements on this fundamental task since the classical work of Vempala and Wang from 2001. Related advances also led to the first memory-efficient robust learners in the streaming model.

High-Dimensional Robust Mean Estimation in Nearly-Linear Time arXiv
Y. Cheng, I. Diakonikolas, R. Ge
SODA 2019
First near-linear time algorithm for outlier-robust mean estimation.
High-Dimensional Robust Mean Estimation via Gradient Descent arXiv
Y. Cheng, I. Diakonikolas, R. Ge, M.~Soltanolkotabi
ICML 2020
Gradient descent on natural non-convex loss efficiently solves robust mean estimation.
Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation arXiv
I. Diakonikolas, D. Kane, D. Kongsgaard, J. Li, K. Tian
STOC 2022
Almost linear-time algorithm for clustering mixtures of separated bounded covariance distributions.
Streaming Algorithms for High-Dimensional Robust Statistics arXiv
I. Diakonikolas, D. Kane, A. Pensia, T. Pittas
ICML 2022
First computationally efficient robust estimators with near-optimal memory requirements in single-pass streaming model.
Robustness as a Lens

Motivated by robustness, this line of work has also led to the development of techniques of broader algorithmic and mathematical interest. One direction establishes covering bounds for algebraic varieties, yielding new algorithmic methods for learning latent variable models. Another develops Sum-of-Squares (SoS) certifiability results—including those for sparse singular values and moment bounds for subgaussian distributions—that yield near-optimal algorithms for a range of well-studied problems, both within robust statistics and beyond. A complementary thread introduces a general method for implicit high-order moment tensor estimation, enabling efficient learning of mixture and broader latent-variable models without explicitly forming large tensors.

Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models arXiv
I. Diakonikolas, D. Kane
FOCS 2020
New structural result in algebraic geometry leading to state-of-the-art algorithms for learning various latent variable models.
SoS Certificates for Sparse Singular Values and Their Applications: Robust Statistics, Subspace Distortion, and More arXiv
I. Diakonikolas, S. Hopkins, A. Pensia, S. Tiegel
STOC 2025
SoS-based algorithms for certifying large sparse singular values. Leads to qualitatively improved algorithms in robust statistics and other areas.
SoS Certifiability of Subgaussian Distributions and Its Algorithmic Applications arXiv
I. Diakonikolas, S. Hopkins, A. Pensia, S. Tiegel
STOC 2025
Moment bounds of any subgaussian distribution can be efficiently certified in the SoS system. Leads to optimal algorithms for various learning tasks involving subgaussian distributions.
Implicit High-Order Moment Tensor Estimation and Learning Latent Variable Models arXiv
I. Diakonikolas, D. Kane
FOCS 2025
General technique of implicit moment tensor estimation. Leads to first polynomial-time algorithms for learning mixtures of spherical Gaussians, mixtures of linear regressions, and sums of non-linear activations.
Bridging Theory and Practice

While much of this research is theoretical, several of the resulting algorithms have proven effective in real-world data analysis tasks. In particular, the iterative filtering framework introduced in the 2016 work has been directly implemented in scalable systems used for automatic outlier removal, data decontamination, and reliability enhancement in large-scale learning pipelines.
Building on this algorithmic foundation, later works extended these ideas to improve the robustness of more general stochastic optimization procedures and develop principled defenses against data and model corruption. For example, they have been used for anomaly detection in genetic datasets, mitigating data-poisoning and backdoor attacks, and improving out-of-distribution detection in deep learning models. This line of work demonstrates how theoretically grounded frameworks for robustness can translate into scalable and reliable methods for modern machine learning systems.
The following papers highlight a few representative collaborative works in this direction; related approaches have since been leveraged across various domains within the broader machine learning community.

Being Robust (in High Dimensions) Can Be Practical arXiv
I. Diakonikolas, G. Kamath, D. Kane, J. Li, A. Moitra, A. Stewart
ICML 2017
Practical implementation of iterative filtering and application to exploratory data analysis.
Sever: A Robust Meta-Algorithm for Stochastic Optimization arXiv
I. Diakonikolas, G. Kamath, D. Kane, J. Li, J. Steinhardt, A. Stewart
ICML 2019
Iterative filtering for outlier-robust stochastic optimization. Applications in defending against data poisoning attacks in ML.
How Does Wild Data Provably Help OOD Detection? arXiv
X. Du, Z. Fang, I. Diakonikolas, Y. Li.
ICLR 2024
Application of iterative filtering approach to out-of-distribution detection.
Supervised Learning with Noise

Brief Overview

Supervised learning traditionally assumes that training labels are correct, yet in many settings labels may be corrupted or adversarial. A central challenge is to understand under which noise models accurate learning is possible, and how noise affects the statistical and computational complexity of such problems.
My work has developed a systematic algorithmic theory of learning in the presence of noisy labels, addressing both semi-random models—such as Massart and Tsybakov noise—and fully adversarial label noise. A key result established the first polynomial-time algorithm for distribution-free PAC learning of halfspaces in the presence Massart noise, resolving a longstanding open problem in computational learning theory.
Building on the distribution-free setting, I subsequently investigated stronger noise models in distribution-specific regimes, including Tsybakov and general Massart noise, as well as adversarial label corruption. These results established both new algorithmic techniques and nearly matching lower bounds, clarifying the precise regimes where efficient learning is possible and where inherent barriers arise. Together, they provide a comprehensive view of how label noise affects the statistical and computational complexity of learning fundamental concept classes.

Highlights
Distribution-free Learning

This line of work concerns learning under label noise without any assumptions on the underlying data distribution. It began with the first provably efficient algorithm for distribution-free PAC learning of halfspaces under Massart noise, and continued with refinements that clarified both the algorithmic landscape and its fundamental limitations.

Distribution-Independent PAC Learning of Halfspaces with Massart Noise arXiv
I. Diakonikolas, T. Gouleakis, C. Tzamos
NeurIPS 2019
First efficient distribution-free algorithm under Massart noise.
Boosting in the Presence of Massart Noise arXiv
I. Diakonikolas, R. Impagliazzo, D. Kane, R. Lei, J. Sorrell, C. Tzamos
COLT 2021
First efficient boosting technique in the presence of Massart noise.
Near-Optimal Statistical Query Hardness of Learning Halfspaces with Massart Noise arXiv
I. Diakonikolas, D. Kane
COLT 2022
First SQ lower bounds matching the algorithmic guarantees.
Cryptographic Hardness of Learning Halfspaces with Massart Noise arXiv
I. Diakonikolas, D. Kane, P. Manurangsi, L. Ren
NeurIPS 2022
Shows hardness via a reduction from LWE.
A Near-Optimal Algorithm for Learning Margin Halfspaces with Massart Noise arXiv
I. Diakonikolas, N. Zarifis
NeurIPS 2024
Sample near-optimal and efficient algorithm for learning Massart halfspaces with a margin.

Beyond noise tolerance, this line of work led to progress on a classical problem in the theory of computation: the construction of Forster transforms. These are transformations that regularize data without changing its essential geometry, and have long played a role in functional analysis, communication complexity, coding theory, and more. Motivated by a connection to the problem of learning halfspaces with label noise, we developed the first strongly polynomial-time algorithm for computing such transforms, which in turn yields a strongly polynomial distribution-free learner for halfspaces—a surprising result, given that proper PAC learning of halfspaces is equivalent to linear programming.

A Strongly Polynomial Algorithm for Approximate Forster Transforms and its Application to Halfspace Learning arXiv
I. Diakonikolas, C. Tzamos, D. Kane
STOC 2023
First strongly polynomial algorithm for Forster transforms, yielding first strongly polynomial distribution-free learner for halfspaces.
Distribution-specific Learning

This line of work examines learning with label noise under additional assumptions on the input distribution—such as Gaussian or log-concave marginals—with the goal of leveraging geometric structure to obtain better algorithms. These results extend the theory beyond the distribution-free setting and demonstrate that distributional structure can enable efficient algorithms that are provably impossible in the distribution-free regime.

Learning Halfspaces with Massart Noise under Structured Distributions arXiv
I. Diakonikolas, V. Kontonis, C. Tzamos, N. Zarifis
COLT 2020
First efficient learning algorithm for Massart halfspaces under a broad family of structured distributions
Non-Convex SGD Learns Halfspaces with Adversarial Label Noise arXiv
I. Diakonikolas, V. Kontonis, C. Tzamos, N. Zarifis
NeurIPS 2020
Gradient descent is robust to adversarial label noise for learning halfspaces under structured distributions.
Learning Halfspaces with Tsybakov Noise arXiv
I. Diakonikolas, V. Kontonis, C. Tzamos, N. Zarifis
STOC 2021
First efficient learner for halfspaces with Tsybakov noise, a strengthening of Massart noise.
A Polynomial-Time Algorithm for Learning Halfspaces with Tsybakov Noise arXiv
I. Diakonikolas, D. Kane, V. Kontonis, C. Tzamos, N. Zarifis
STOC 2021
Fully polynomial-time algorithm in the context of the companion paper above.
Learning General Halfspaces with General Massart Noise under the Gaussian Distribution arXiv
I. Diakonikolas, D. Kane, V. Kontonis, C. Tzamos, N. Zarifis
STOC 2022
Characterizes the complexity of learning halfspaces with the most general form of semi-random noise in the Gaussian setting.
Super Non-Singular Decompositions of Polynomials and their Application to Robustly Learning Low-Degree PTFs arXiv
I. Diakonikolas, D. Kane, V. Kontonis, S. Liu, N. Zarifis
STOC 2024
Develops novel techniques for robustly learning low-degree polynomial threshold functions under the Gaussian distribution, broadening noise-tolerant learning beyond linear models.
Foundations of Deep Learning

Brief Overview

Deep neural networks achieve remarkable empirical success, yet our theoretical understanding of when such models can be learned efficiently remains limited. My work has approached this question through single-index models (SIMs) and multi-index models (MIMs)—theoretical frameworks that isolate core nonlinear phenomena in a mathematically tractable form. The central aim is to determine when these models admit computationally efficient learning algorithms, and when inherent barriers—statistical or computational—make learning impossible, including in the presence of label noise or adversarial perturbations.
Within this framework, I have developed algorithms and computational lower bounds that delineate the feasibility of learning nonlinear predictors—from individual neurons to multi-index representations.

Highlights
Single-Index Models

A generalized linear model (GLM) has the form \(f(x)=\sigma(w\cdot x)\) with a known activation \(\sigma\) (e.g., ReLU or sigmoid), while a single-index model (SIM) keeps the same form but allows \(\sigma\) to be unknown, typically restricted to a structured family (e.g., all monotone functions). My work investigates when such nonlinear predictors can be learned by efficient algorithms under realistic data imperfections (agnostic/adversarial label noise), and clarifies where computational barriers arise.
This line of work develops robust algorithms for GLMs and SIMs under adversarial label noise. The algorithmic results in this thread achieve a constant-factor approximation to the optimal loss—the strongest guarantee possible for polynomial-time algorithms in the agnostic setting.

The following paper showed that gradient-based methods can succeed in the agnostic setting, for a broad family of known activations \(\sigma\), including ReLUs and sigmoids.

Learning a Single Neuron with Adversarial Label Noise via Gradient Descent arXiv
I. Diakonikolas, V. Kontonis, C. Tzamos, N. Zarifis
COLT 2022
First gradient-based robust learner for a natural subclass of monotone activations.

Subsequent work developed alternative algorithmic approaches—using a convex surrogate whose analysis leverages a connection to local error bounds from optimization—to obtain robust learning guarantees under much milder distributional assumptions:

Robustly Learning a Single Neuron via Sharpness arXiv
P. Wang, N. Zarifis, I. Diakonikolas, J. Diakonikolas
ICML 2023
Convex-surrogate framework analyzed via sharpness for broad distributions in the agnostic model.

The next advance gave general algorithms for all monotone activations (still with known activation), assuming Gaussian inputs:

Robustly Learning Monotone Generalized Linear Models via Data Augmentation arXiv
N. Zarifis, P. Wang, I. Diakonikolas, J. Diakonikolas
COLT 2025
First efficient and robust learner for all monotone GLMs assuming Gaussian inputs.

Finally, we move from GLMs to SIMs, where the activation is unknown. This shift makes the problem more challenging and calls for new techniques beyond gradient methods. The following work develops an optimization framework guided by a problem-specific vector field:

Robustly Learning Monotone Single-Index Models arXiv
P. Wang, N. Zarifis, I. Diakonikolas, J. Diakonikolas
NeurIPS 2025
First efficient agnostic learner for general monotone SIMs under Gaussian inputs.
Multi-Index Models and Representation Learning

While single-index models capture the behavior of individual neurons, modern learning tasks require the recovery of multiple latent features. Extending our algorithmic tools to multi-index models marks a fundamental shift where new computational challenges arise.
My work in this domain investigates when multi-index models (MIMs)—a broad class encompassing shallow neural networks, intersections of halfspaces, multiclass linear classifiers, and many other non-linear predictors—can be learned efficiently.
The core challenge is to determine the algorithmic feasibility of such models and to understand when inherent computational barriers arise. In a sequence of works, I developed algorithms and lower bounds for MIMs, qualitatively characterizing the complexity of learning, both in the realizable setting and in the presence of noisy labels.

A natural first step beyond SIMs is learning positive linear combinations of ReLUs (and other natural activations), which already correspond to natural depth-2 networks with multiple hidden units. My work established that such models are efficiently learnable in the PAC framework:

Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks arXiv
I. Diakonikolas, D. Kane, V. Kontonis, N. Zarifis
COLT 2020
First non-trivial PAC learning algorithm for positive combinations of ReLUs.

The computational complexity of the above algorithm scales exponentially with the number of hidden units. In the following paper, inspired by robustness, we developed an algebraic geometry approach leading to quasi-polynomial dependence:

Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models arXiv
I. Diakonikolas, D. Kane
FOCS 2020
Polynomial-cover technique with quasi-polynomial complexity dependence on number of hidden neurons.

The next advance removed positivity constraints, yielding algorithms for general linear combinations of ReLUs with stronger accuracy guarantees:

Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials arXiv
I. Diakonikolas, D. Kane
COLT 2024
Gives an algorithm for this class with optimal complexity in the Correlational SQ model.

The next stage tackled fully general MIMs, beginning with discrete-valued targets:

Robust Learning of Multi-Index Models via Iterative Subspace Approximation arXiv
I. Diakonikolas, G. Iakovidis, D. Kane, N. Zarifis
FOCS 2025
First algorithm and SQ lower bound for discrete-valued MIMs, with or without label noise, with optimal dependence on dimension. Applications to multiclass linear classification and learning intersections of halfspaces, both in the presence of adversarial label noise.

A companion result addressed real-valued MIMs, establishing analogous guarantees:

Algorithms and SQ Lower Bounds for Robustly Learning Real-Valued Multi-Index Models arXiv
I. Diakonikolas, G. Iakovidis, D. Kane, L. Ren
NeurIPS 2025
Provides an efficient algorithm and matching SQ hardness result for real-valued MIMs. Implies state-of-the-art learner for bounded-depth ReLU networks.

A related line of work investigated the power of access models, showing that query access can strictly improve learnability over random sampling. A representative paper is given below:

Agnostically Learning Multi-Index Models with Queries arXiv
I. Diakonikolas, D. Kane, V. Kontonis, C. Tzamos, N. Zarifis
FOCS 2024
Demonstrates that queries can outperform sampling in the context of learning MIMs.
Statistical-Computational Tradeoffs

Brief Overview

Many fundamental learning problems admit statistically optimal solutions, yet it remains unclear when such solutions can be achieved efficiently. Understanding these statistical-computational tradeoffs is central to my work. There are two general approaches to establishing such barriers: (1) proving unconditional lower bounds for broad classes of algorithms (e.g., SQ algorithms, Sum-of-Squares algorithms), and (2) developing efficient reductions from average-case problems believed to be hard. My research has primarily advanced the first approach, and has established connections between the two approaches.
A significant component of my work in this area is based on the Statistical Query (SQ) framework, using the Non-Gaussian Component Analysis (NGCA) methodology introduced in my FOCS 2017 paper. This framework has been leveraged to prove strong–in many cases, optimal–SQ lower bounds for a wide range of learning tasks, including for robust estimation, learning mixture models, and supervised learning (both for classification and regression).
More recently, my work in this area has expanded beyond Statistical Queries to other computational models–including Polynomial Threshold Function (PTF) tests and Sum-of-Squares (SoS) algorithms— and has also led to reduction-based hardness results inspired by SQ constructions.

Highlights

This line of work investigates when learning problems are solvable in principle, yet remain intractable for efficient algorithms. A central focus is on establishing unconditional lower bounds for broad classes of algorithms, primarily through the Statistical Query (SQ) framework and the Non-Gaussian Component Analysis (NGCA) methodology introduced in the following paper.
These techniques provide a unified lens for proving computational barriers across a diverse range of learning settings, and have also inspired subsequent work on reduction-based hardness and lower bounds in other algorithmic models such as Sum-of-Squares (SoS) and Polynomial Threshold Function (PTF) tests.

Basic Framework

The NGCA framework for proving SQ lower bounds was introduced in the following paper. Perhaps surprisingly, this methodology was inspired largely by questions in robust statistics.

Statistical Query Lower Bounds for Robust Estimation of High-dimensional Gaussians and Gaussian Mixtures arXiv
I. Diakonikolas, D. Kane, A. Stewart
FOCS 2017
Introduces the NGCA framework for proving SQ lower bounds and applies it to a range of estimation tasks, including robust mean and covariance estimation, Gaussian mixtures, and sparse estimation problems.

Since its introduction, the NGCA methodology has become a flexible tool for establishing statistical-computational tradeoffs across learning theory. It provides a general template for constructing families of hard instances whose low-order moments match Gaussian moments, showing that any algorithm capable of solving these problems efficiently would yield an NGCA solver. A good entry point for understanding these developments (with a complete set of applications till 2023) is Chapter 8 of my Algorithmic Robust Statistics book.
Below are illustrative examples from my work that build on this framework.

Statistical-Computational Tradeoffs for Unsupervised Learning

Beyond the settings considered in the FOCS 2017 paper above, the first group of applications concerns unsupervised settings, including in robust estimation and learning mixture models.

Statistical Query Lower Bounds for List-Decodable Linear Regression arXiv
I. Diakonikolas, D. Kane, A. Pensia, T. Pittas, A. Stewart
NeurIPS 2021
Near-optimal SQ lower bounds for list-decodable linear regression. Shows that when a majority of samples are corrupted, known algorithms are qualitatively optimal.
SQ Lower Bounds for Learning Mixtures of Separated and Bounded Covariance Gaussians arXiv
I. Diakonikolas, D. Kane, T. Pittas, N. Zarifis
COLT 2023
Constructs NGCA-based hard instances for learning Gaussian mixtures with separated means and bounded covariance, refining computational limits for clustering problems.
SQ Lower Bounds for Non-Gaussian Component Analysis with Weaker Assumptions arXiv
I. Diakonikolas, D. Kane, L. Ren, Y. Sun
NeurIPS 2023
Strengthens NGCA lower bounds by removing finite chi-squared divergence assumptions, extending the framework to broader inference tasks.
Statistical-Computational Tradeoffs for Supervised Learning

NGCA-based techniques have also yielded lower bounds for a range of supervised learning problems, both in noiseless and noisy settings.

Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks arXiv
I. Diakonikolas, D. Kane, V. Kontonis, N. Zarifis
COLT 2020
Establishes Correlational SQ lower bounds for learning linear combinations of ReLUs under Gaussian marginals. Our COLT 2024 paper gives a matching upper bound.
The Optimality of Polynomial Regression for Agnostic Learning under Gaussian Marginals arXiv
I. Diakonikolas, D. Kane, T. Pittas, N. Zarifis
COLT 2021
L1 polynomial regression is optimal for agnostically learning Boolean functions under Gaussian marginals. Analogous SQ hardness framework for agnostic learning real-valued functions.
Near-Optimal Statistical Query Hardness of Learning Halfspaces with Massart Noise arXiv
I. Diakonikolas, D. Kane
COLT 2022
Sharp SQ lower bound for learning halfspaces with Massart noise. Shows that the error guarantee of the algorithm in my NeurIPS 2019 paper is optimal up to constant factors.
Algorithms and SQ Lower Bounds for Robustly Learning Real-Valued Multi-Index Models arXiv
I. Diakonikolas, G. Iakovidis, D. Kane, L. Ren
NeurIPS 2025
Introduces the relativized NGCA framework and establishes optimal SQ lower bounds for it. As a corollary, derives optimal SQ lower bounds for learning multi-index models.
Beyond SQ-hardness

To further understand the robustness of these barriers, subsequent work established hardness not only within the SQ framework, but also through reductions and within other algorithmic models such as SoS and PTF tests.

Cryptographic Hardness of Learning Halfspaces with Massart Noise arXiv
I. Diakonikolas, D. Kane, P. Manurangsi, L. Ren
NeurIPS 2022
Builds on NGCA-inspired SQ-hard instances for Massart halfspaces and establishes similar hardness under standard cryptographic assumptions.
Sum-of-Squares Lower Bounds for Non-Gaussian Component Analysis arXiv
I. Diakonikolas, S. Karmalkar, S. Pang, A. Potechin
FOCS 2024
Establishes lower bounds for NGCA against sum-of-squares algorithms.
PTF Testing Lower Bounds for Non-Gaussian Component Analysis arXiv
I. Diakonikolas, D. Kane, S. Liu, T. Pittas
FOCS 2025
Proves hardness of NGCA against polynomial threshold function tests.
Distribution Testing

Brief Overview

Distribution testing provides the algorithmic foundations for reasoning about data-generating processes. Given sample access to one or more unknown distributions, the goal is to decide whether they satisfy a desired property—such as identity, independence, or structural shape—or are far from satisfying it. My work has developed general and sharp characterizations of the sample complexity of such testing tasks, both in discrete and in structured settings.
For discrete distributions, my results yield sample-optimal algorithms for fundamental problems such as identity, closeness, and independence testing. For structured families—such as monotone, unimodal, log-concave, and mixture distributions—my work introduced the $A_k$-distance framework, an analytic tool that captures the right geometric parameters governing testing complexity and enables near-optimal testers even for continuous distributions.

Highlights

My work in distribution testing is organized into two main threads, depending on whether structural assumptions are imposed on the distribution.

Testing Discrete Distributions

This line of work investigates the sample complexity of testing fundamental properties of discrete distributions–such as identity, closeness, and independence– under minimal assumptions. A recurring theme is obtaining optimal bounds (up to constant factors) and developing techniques that are broadly applicable across testing problems.

Optimal Algorithms for Testing Closeness of Discrete Distributions arXiv
S. Chan, I. Diakonikolas, G. Valiant, P. Valiant
SODA 2014
First optimal algorithms and matching lower bounds for testing closeness between arbitrary discrete distributions. Develops adaptation of chi-squared statistic and novel analysis technique that has been leveraged in many subsequent works.
A New Approach for Testing Properties of Discrete Distributions arXiv
I. Diakonikolas, D. Kane
FOCS 2016
General framework for discrete distribution testing, yielding the first sample-optimal testers and matching lower bounds for independence and many other properties.
Testing Conditional Independence of Discrete Distributions arXiv
C. Canonne, I. Diakonikolas, D. Kane, A. Stewart
STOC 2018
Develops techniques to prove tight sample bounds for testing conditional independence.
Optimal Testing of Discrete Distributions with High Probability arXiv
I. Diakonikolas, T. Gouleakis, D. Kane, J. Peebles, E. Price
STOC 2021
New algorithmic and probabilistic techniques, yielding new optimal testers in the high-probability regime.
Testing Structured Distributions

For structured distribution families, testing under the total variation distance turns out to be possible even for continuous distributions. In contrast to the discrete unstructured setting, the domain size is no longer the right complexity parameter. My work introduced the \(A_k\)-distance framework—a general methodology for identifying the appropriate complexity measure and designing near-optimal testers for a wide range of structured families.

Testing Identity of Structured Distributions arXiv
I. Diakonikolas, D. Kane, V. Nikishkin
SODA 2015
Introduces the $A_k$-framework, giving optimal algorithms for identity testing under shape constraints such as monotone, unimodal, log-concave, and mixture families.
Optimal Algorithms and Lower Bounds for Testing Closeness of Structured Distributions arXiv
I. Diakonikolas, D. Kane, V. Nikishkin
FOCS 2015
Extends the $A_k$-framework to two-sample (closeness) testing, obtaining optimal algorithms and matching information-theoretic lower bounds.
Testing Identity of Multidimensional Histograms arXiv
I. Diakonikolas, D. Kane, J. Peebles
COLT 2019
Establishes near-optimal identity testers under the $A_k$-distance, combining geometric and analytic tools to handle high-dimensional histogram families.
Testing Closeness of Multivariate Distributions via Ramsey Theory arXiv
I. Diakonikolas, D. Kane, S. Liu
STOC 2024
Develops a Ramsey-theoretic approach to testing closeness under a multivariate generalization of the $A_k$-distance, yielding the first optimal bounds in fixed dimensions.

Beyond shape-constrained families, my work has developed new methodologies for testing in high-dimensional parametric settings, including structured probabilistic models such as Bayesian networks, where dependencies among variables must also be inferred.

Testing Bayesian Networks arXiv
C. Canonne, I. Diakonikolas, D. Kane, A. Stewart
COLT 2017
Provides the first algorithms and lower bounds for testing graphical models, establishing sublinear dependence on the network size.
Learning Structured Distributions

Brief Overview

A central goal of my work on distribution learning is to understand when complex probability distributions can be estimated both statistically optimally and computationally efficiently. This research has developed a unified algorithmic framework for nonparametric density estimation, introducing approximation and Fourier analytic techniques that apply across both discrete and continuous structured settings.
A key thread established the use of piecewise polynomial approximation to learn univariate structured distributions, yielding sample-optimal and near-linear time algorithms, with further applications to data summarization. A second line of work developed efficient algorithms for sums of independent random variables, including Poisson binomial and multinomial distributions, through new moment-matching and Fourier-analytic methods. A third direction focused on structured multivariate distributions—such as log-concave and histogram-based densities—providing the first sample and computationally efficient algorithms for these nonparametric models.

Highlights

Understanding how to efficiently estimate an unknown probability distribution from samples is one of the classical questions in statistics and machine learning. My research on nonparametric distribution learning provides a unified algorithmic perspective on this problem, developing methods that are both sample-optimal and computationally efficient across a range of structured settings. This work combines ideas from approximation theory, harmonic analysis, and convex optimization, and has influenced subsequent developments in both theory and applications.

Distribution Learning via Piecewise Polynomial Approximation

This line of work introduced a general framework for learning arbitrary univariate distributions via piecewise polynomial approximation.

Efficient Density Estimation via Piecewise Polynomial Approximation arXiv
S. Chan, I. Diakonikolas, R. Servedio, X. Sun
STOC 2014
Introduces the core piecewise-polynomial approximation framework for univariate density estimation and gives the first polynomial-time learning algorithm.
Fast and Near-Optimal Algorithms for Approximating Distributions by Histograms PDF
J. Acharya, I. Diakonikolas, C. Hegde, J. Li, L. Schmidt
PODS 2015
Applies the piecewise-approximation methodology to data summarization.
Sample Optimal Density Estimation in Nearly-Linear Time arXiv
J. Acharya, I. Diakonikolas, C. Hegde, J. Li, L. Schmidt
SODA 2017
Achieves both sample-optimality and near-linear runtime for robust density estimation via piecewise polynomials.
Learning Sums of Independent Random Variables

This thread focused on sums of independent random variables, a fundamental family that includes Poisson binomial and Poisson multinomial distributions. Some of the techniques developed here are of broader interest in probability and algorithmic game theory.

Learning Poisson Binomial Distributions arXiv
C. Daskalakis, I. Diakonikolas, R. Servedio
STOC 2012
First sample and computationally efficient algorithm for learning Poisson binomial distributions.
Learning Sums of Independent Integer Random Variables PDF
C. Daskalakis, I. Diakonikolas, R. O'Donnell, R. Servedio, L.-Y. Tan
FOCS 2013
New limit theorem for sums of independent discrete-valued random variables. Leveraging this: sample-efficient and polynomial-time algorithm for this class of distributions.
The Fourier Transform of Poisson Multinomial Distributions and Its Algorithmic Applications arXiv
I. Diakonikolas, D. Kane, A. Stewart
STOC 2016
Refined geometric characterization of Poisson multinomial distributions. Three implications: (1) first sample-optimal and computationally efficient learning algorithms for this distribution family; (2) new multivariate Central Limit Theorems for these distributions; and (3) fastest known algorithms for computing approximate Nash equilibria in anonymous games.
Learning Multivariate Nonparametric Families

The final thread studies structured nonparametric families in any fixed dimension, such as log-concave and histogram-based densities.

Learning Multivariate Log-Concave Distributions arXiv
I. Diakonikolas, D. Kane, A. Stewart
COLT 2017
First sample upper bound for learning multivariate log-concave densities.
Near-Optimal Sample Complexity Bounds for Maximum Likelihood Estimation of Multivariate Log-Concave Densities arXiv
T. Carpenter, I. Diakonikolas, A. Sidiropoulos, A. Stewart
COLT 2018
Near-optimal sample complexity bounds for MLE in the multivariate log-concave setting.
Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms arXiv
I. Diakonikolas, J. Li, L. Schmidt
COLT 2018
First sample and computationally efficient algorithm for learning multidimensional histograms.
Applied Probability and Applications

Brief Overview

This line of work develops new probabilistic limit theorems and analytic tools that lead to efficient algorithms. My research in this domain has produced new tools in probability theory that have found applications across learning theory, stochastic optimization, and computational complexity.

Highlights
Pseudorandomness and Derandomization

My early work showed that the Berry--Esseen Central Limit Theorem continues to hold when the summed random variables are only locally independent. This "local independence CLT"–a new probabilistic limit theorem–was motivated by questions in computational complexity and derandomization, and it has been a useful tool in learning theory, streaming algorithms, and hardness of approximation.

Bounded Independence Fools Halfspaces arXiv
I. Diakonikolas, P. Gopalan, R. Jaiswal, R. Servedio, E. Viola
FOCS 2009/SICOMP 2010
Shows that limited independence suffices to "fool'' linear threshold functions, leading to the first unconditional pseudorandom generators for this class.

A followup work extended these techniques to quadratic functions, establishing quantitative CLTs for degree-$2$ polynomials and introducing analytic methods that have been used in several subsequent works.

Bounded Independence Fools Degree-2 Threshold Functions arXiv
I. Diakonikolas, D. Kane, J. Nelson
FOCS 2010
Extends the bounded-independence paradigm to quadratic polynomials, giving new quantitative CLTs and analytic tools.
Analysis of Threshold Functions

This direction provides a refined analytic understanding of polynomial threshold functions, yielding strong quantitative bounds on their sensitivity properties.

Average Sensitivity and Noise Sensitivity of Polynomial Threshold Functions arXiv
I. Diakonikolas, P. Raghavendra, R. Servedio, L.-Y. Tan
STOC 2010/SICOMP 2014
Establishes new bounds on the noise stability of polynomial threshold functions, with implications for agnostic learning.
A Regularity Lemma, and Low-weight Approximators, for Low-degree Polynomial Threshold Functions arXiv
I. Diakonikolas, R. Servedio, L.-Y. Tan, A. Wan
CCC 2010/Theory of Computing 2014
Shows that every degree-$d$ PTF can be decomposed into a constant number of subfunctions such that almost all are close to being regular PTFs.
Learning from Limited Information

A complementary line of work gives the first efficient algorithms for learning halfspaces and, more generally, degree-$d$ polynomial threshold functions (PTFs) in a limited-information setting, where only partial access to the attributes is available. The central structural insight underlying these algorithms is a robust identifiability theorem: if two degree-$d$ PTFs have approximately matching degree-$\le d$ Fourier coefficients (their so called Chow parameters), then the functions themselves are close in $L_1$ distance.

Nearly Optimal Solutions for the Chow Parameters Problem and Low-Weight Approximation of Halfspaces arXiv
A. De, I. Diakonikolas, V. Feldman, R. Servedio
STOC 2012/JACM 2014
First efficient algorithm for learning halfspaces from approximations to their degree-$\le 1$ Fourier coefficients; introduces robust identifiability for threshold functions.
Degree-$d$ Chow Parameters Robustly Determine Degree-$d$ PTFs (and Algorithmic Applications) arXiv
I. Diakonikolas, D. Kane
STOC 2019
Generalizes the robust identifiability theorem to degree-$d$ PTFs and gives efficient algorithms for learning them from approximations to their degree-$\le d$ Fourier coefficients.
Geometric Probability

More recently, I have explored high-dimensional geometric probability with algorithmic implications—for example, deriving nearly tight bounds for fitting an ellipsoid to Gaussian random samples, a question at the interface of geometry, optimization, and statistics.

A Nearly Tight Bound for Fitting an Ellipsoid to Gaussian Random Points arXiv
D. Kane, I. Diakonikolas
COLT 2023
Resolves the ellipsoid fitting conjecture up to a logarithmic factor.