Research: My main research interests are in algorithms, learning, statistics, and applied probability.
A major goal of my work is to understand the tradeoff between statistical efficiency, computational efficiency, and robustness
for fundamental problems in machine learning.
I also have strong interests in optimization, game theory, and their connections to learning.
Interested in working with me? Apply to our Ph.D. program.
University of Southern California
Department of Computer Science
941 Bloom Walk, SAL 220
Los Angeles, CA 90089-0781
diakonik_at_usc.edu
My research has been supported by a Marie Curie Career Integration Grant
(CIG) and an EPSRC first grant.
Book Chapter
Learning Structured Distributions
[abstract] Invited Book Chapter, Handbook of Big Data, Chapman and Hall/CRC, February 2016.
Estimating distributions from samples is a paradigmatic and fundamental unsupervised learning problem
that has been studied in statistics since the late nineteenth century, starting with the pioneering
work of Karl Pearson. During the past couple of decades, there has been a large body of work
in computer science on this topic with a focus on {\em computational efficiency.}
The area of distribution estimation is well-motivated in its own right, and
has seen a recent surge of research activity, in part due to the ubiquity of structured distributions
in the natural and social sciences. Such structural properties of distributions
are sometimes direct consequences of the underlying application problem,
or they are a plausible explanation of the model under investigation.
In this chapter, we give a survey of both classical and modern techniques for distribution estimation,
with a focus on recent algorithmic ideas developed in theoretical computer science.
These ideas have led to computationally and statistically efficient algorithms for learning broad families of models.
For the sake of concreteness, we illustrate these ideas with specific examples.
Finally, we highlight outstanding challenges and research directions for future work.
Statistical Query Lower Bounds for Robust Estimation of High-dimensional Gaussians and Gaussian Mixtures
[abstract][eccc] I. Diakonikolas, D. Kane, A. Stewart
Manuscript
We prove the first {\em Statistical Query lower bounds} for
two fundamental high-dimensional learning problems involving
Gaussian distributions: (1) learning Gaussian mixture models (GMMs), and (2) robust (agnostic) learning
of a single unknown mean Gaussian. In particular, we show a {\em super-polynomial gap} between the (information-theoretic)
sample complexity and the complexity of {\em any} Statistical Query algorithm for these problems.
Statistical Query (SQ) algorithms are a class of algorithms
that are only allowed to query expectations of functions of the distribution rather than directly access samples.
This class of algorithms is quite broad: with the sole exception of Gaussian elimination over finite fields,
all known algorithmic approaches in machine learning can be implemented in this model.
Our SQ lower bound for Problem (1)
is qualitatively matched by known learning algorithms for GMMs (all of which can be implemented as SQ algorithms).
At a conceptual level, this result implies that -- as far as SQ algorithms are concerned -- the computational complexity
of learning GMMs is inherently exponential
{\it in the dimension of the latent space} -- even though there
is no such information-theoretic barrier. Our lower bound for Problem (2) implies that the accuracy of the robust learning algorithm
in~\cite{DiakonikolasKKLMS16} is essentially best possible among all polynomial-time SQ algorithms.
On the positive side, we give a new SQ learning algorithm for this problem
with optimal accuracy whose running time nearly matches our lower bound.
Both our SQ lower bounds are attained via a unified moment-matching technique that may be useful in other contexts.
Our SQ learning algorithm for Problem (2) relies on a filtering technique that removes outliers based on higher-order tensors.
Our lower bound technique also has implications for related inference problems,
specifically for the problem of robust {\it testing} of an unknown mean Gaussian.
Here we show an information-theoretic lower bound
which separates the sample complexity of the robust testing problem from its non-robust variant.
This result is surprising because such a separation does not exist
for the corresponding learning problem.
Efficient and Optimally Robust Learning of High-Dimensional Gaussians
I. Diakonikolas, G. Kamath, D. Kane, J. Li, A. Moitra, A. Stewart
Manuscript
Testing Bayesian Networks
C. Canonne, I. Diakonikolas, D. Kane, A. Stewart
Manuscript
Collision-based Testers are Optimal for Uniformity and Closeness
[abstract][eccc] I. Diakonikolas, T. Gouleakis, J. Peebles, E. Price
Manuscript
We study the fundamental problems of (i) uniformity testing of a discrete distribution,
and (ii) closeness testing between two discrete distributions with bounded $\ell_2$-norm.
These problems have been extensively studied in distribution testing
and sample-optimal estimators are known for them~\cite{Paninski:08, CDVV14, VV14, DKN:15}.
In this work, we show that the original collision-based testers proposed for these problems
~\cite{GRdist:00, BFR+:00} are sample-optimal, up to constant factors.
Previous analyses showed sample complexity upper bounds for these testers that are optimal
as a function of the domain size $n$, but suboptimal by polynomial factors
in the error parameter $\epsilon$. Our main contribution is a new tight analysis
establishing that these collision-based testers are information-theoretically optimal,
up to constant factors, both in the dependence on $n$ and in the dependence on $\epsilon$.
Robust Learning of Fixed-Structure Bayesian Networks
[abstract][arxiv] I. Diakonikolas, D. Kane, A. Stewart
Manuscript
We investigate the problem of learning Bayesian networks in an agnostic model
where an $\epsilon$-fraction of the samples are adversarially corrupted.
Our agnostic learning model is similar to -- in fact, stronger than -- Huber's
contamination model in robust statistics. In this work, we study the fully observable
Bernoulli case where the structure of the network is given.
Even in this basic setting, previous learning algorithms
either run in exponential time or lose dimension-dependent factors in their
error guarantees.
We provide the first computationally efficient agnostic learning algorithm for this problem
with dimension-independent error guarantees. Our algorithm has polynomial sample complexity,
runs in polynomial time, and achieves error that scales nearly-linearly with the fraction
of adversarially corrupted samples.
Learning Multivariate Log-concave Distributions
[abstract][arxiv] I. Diakonikolas, D. Kane, A. Stewart
Manuscript
We study the problem of estimating multivariate log-concave probability density functions.
We prove the first sample complexity upper bound for learning log-concave densities
on $\mathbb{R}^d$, for all $d \geq 1$. Prior to our work, no upper bound on the
sample complexity of this learning problem was known for the case of $d>3$.
In more detail, we give an estimator that, for any $d \ge 1$ and $\epsilon>0$,
draws $\tilde{O}_d \left( (1/\epsilon)^{(d+5)/2} \right)$ samples from an unknown
target log-concave density on $\mathbb{R}^d$, and outputs a hypothesis that
(with high probability) is $\epsilon$-close to the target, in total variation distance.
Our upper bound on the sample complexity comes close to the known lower bound of
$\Omega_d \left( (1/\epsilon)^{(d+1)/2} \right)$ for this problem.
Efficient Robust Proper Learning of Log-concave Distributions
[abstract][arxiv] I. Diakonikolas, D. Kane, A. Stewart
Manuscript
We study the {\it robust proper learning} of univariate log-concave distributions
(over continuous and discrete domains). Given a set of samples drawn
from an unknown target distribution, we want to compute a log-concave
hypothesis distribution that is as close as possible to the target, in total variation distance.
In this work, we give the first computationally efficient algorithm
for this learning problem. Our algorithm achieves the information-theoretically optimal
sample size (up to a constant factor), runs in polynomial time,
and is robust to model misspecification with nearly-optimal
error guarantees.
Specifically, we give an algorithm that,
on input $n=O(1/\epsilon^{5/2})$ samples from an unknown distribution $f$,
runs in time $\widetilde{O}(n^{8/5})$,
and outputs a log-concave hypothesis $h$ that (with high probability) satisfies
$d_{\mathrm{TV}}(h, f) = O(\mathrm{opt})+\epsilon$, where $\mathrm{opt}$
is the minimum total variation distance between $f$
and the class of log-concave distributions.
Our approach to the robust proper learning problem is quite flexible and may be applicable
to many other univariate distribution families.
Testing Closeness of Structured Distributions over Discrete Domains
[abstract] I. Diakonikolas, D. Kane, V. Nikishkin
Manuscript
We investigate the problem of testing the equivalence between two structured discrete distributions.
Let $\mathcal{D}$ be a family of distributions over a discrete support of size $n$.
Given a set of samples from two distributions $p, q \in \mathcal{D}$,
we want to distinguish (with high probability) between the cases that $p = q$ and $\|p-q\|_1 \geq \epsilon$.
The main contribution of this paper is a new general algorithm for this testing problem
and a nearly matching information-theoretic lower bound. Specifically, the sample complexity
of our algorithm matches our lower bound up to logarithmic factors. As a corollary, we obtain new
near-sample optimal equivalence testers for a wide range of discrete structured distributions in a unified way.
Playing Anonymous Games using Simple Strategies
[abstract][arxiv] Y. Cheng, I. Diakonikolas, A. Stewart
Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2017)
We investigate the complexity of computing approximate Nash equilibria in anonymous games.
Our main algorithmic result is the following: For any $n$-player anonymous game with a bounded
number of strategies and any constant $\delta>0$, an $O(1/n^{1-\delta})$-approximate Nash
equilibrium can be computed in polynomial time.
Complementing this positive result, we show that if there exists any constant $\delta>0$
such that an $O(1/n^{1+\delta})$-approximate equilibrium can be computed in polynomial time,
then there is a fully polynomial-time approximation scheme for this problem.
We also present a faster algorithm that, for any $n$-player $k$-strategy anonymous game,
runs in time $\tilde O((n+k) k n^k)$ and computes an $\tilde O(n^{-1/3} k^{11/3})$-approximate equilibrium.
This algorithm follows from the existence of simple approximate equilibria of anonymous games,
where each player plays one strategy with probability $1-\delta$, for some small $\delta$,
and plays uniformly at random with probability $\delta$.
Our approach exploits the connection between Nash equilibria in anonymous games and Poisson multinomial distributions (PMDs).
Specifically, we prove a new probabilistic lemma establishing the following:
Two PMDs, with large variance in each direction, whose first few moments
are approximately matching are close in total variation distance.
Our structural result strengthens previous work by providing a smooth tradeoff
between the variance bound and the number of matching moments.
Sample Optimal Density Estimation in Nearly-Linear Time
[abstract][pdf] J. Acharya, I. Diakonikolas, J. Li, L. Schmidt
Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2017)
We design a new, fast algorithm for agnostically learning univariate probability distributions
whose densities are well approximated by piecewise polynomial functions.
Let $f$ be the density function of an arbitrary univariate distribution,
and suppose that $f$ is $\mathrm{OPT}$ close in $L_1$ distance to an unknown piecewise polynomial function
with $t$ interval pieces and degree $d$. Our algorithm
draws $m = O(t(d+1)/\epsilon^2)$ samples from $f$, runs in time $\widetilde{O} (m \cdot \mathrm{poly} (d))$ and with probability at least
$9/10$ outputs an $O(t)$-piecewise degree-$m$ hypothesis $h$ that is
$4 \mathrm{OPT} +\epsilon$ close to $f$.
Our general algorithm yields (near-)sample-optimal and near-linear time estimators for a wide range of structured distribution families
over both continuous and discrete domains in a unified way. For most of our applications, these are the first sample-optimal and near-linear time
estimators in the literature. As a consequence, our work resolves the sample and computational complexities of a broad class of inference
tasks via a single "meta-algorithm". Moreover, we experimentally demonstrate that our algorithm performs very well in practice.
Robust Estimators in High Dimensions without the Computational Intractability
[abstract][arxiv] I. Diakonikolas, G. Kamath, D. Kane, J. Li, A. Moitra, A. Stewart
Proceedings of the 57th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2016)
Invited to the SIAM Journal on Computing Special Issue for FOCS 2016
We study high-dimensional distribution learning in an agnostic setting
where an adversary is allowed to arbitrarily corrupt an $\epsilon$ fraction of the samples.
Such questions have a rich history spanning statistics, machine learning and theoretical computer science.
Even in the most basic settings,
the only known approaches are either computationally inefficient
or lose dimension dependent factors in their error guarantees.
This raises the following question:
Is high-dimensional agnostic distribution learning even possible, algorithmically?
In this work, we obtain the first computationally efficient algorithms with dimension-independent error guarantees
for agnostically learning several fundamental classes of high-dimensional distributions:
(1) a single Gaussian, (2) a product distribution on the hypercube,
(3) mixtures of two product distributions (under a natural balancedness condition),
and (4) mixtures of spherical Gaussians.
Our algorithms achieve error that is independent of the dimension,
and in many cases scales nearly-linearly
with the fraction of adversarially corrupted samples.
Moreover, we develop a general recipe for detecting and correcting corruptions in high dimensions,
that may be applicable to many other problems.
A New Approach for Testing Properties of Discrete Distributions
[abstract][pdf][arxiv] I. Diakonikolas, D. Kane
Proceedings of the 57th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2016)
Also see the comments in Oded Goldreich's Choices
#188 and
#195,
and Oded's very nice exposition of our framework here
We study problems in distribution property testing:
Given sample access to one or more unknown discrete distributions,
we want to determine whether they have some global property or are $\epsilon$-far
from having the property in $\ell_1$ distance.
In this paper, we provide a simple and general approach to obtain upper bounds in this setting,
by reducing $\ell_1$-testing to $\ell_2$-testing.
Our reduction yields optimal $\ell_1$-testers, by using a standard $\ell_2$-tester as a black-box.
Using our framework, we obtain sample--optimal and computationally efficient estimators for
a wide variety of $\ell_1$ distribution testing problems, including the following: identity testing to a fixed distribution,
closeness testing between two unknown distributions (with equal/unequal sample sizes),
independence testing (in any number of dimensions), closeness testing for collections of distributions, and testing
$k$-histograms. For most of these problems, we give the first optimal testers in the literature.
Moreover, our estimators are significantly simpler to state and analyze compared to previous approaches.
As our second main contribution, we provide a direct general approach for proving distribution testing lower bounds,
by bounding the mutual information. Our lower bound approach is not restricted to symmetric properties,
and we use it to prove tight lower bounds for the aforementioned problems.
Fast Algorithms for Segmented Regression
[abstract][pdf] J. Acharya, I. Diakonikolas, J. Li, L. Schmidt
Proceedings of the 33rd International Conference on Machine Learning (ICML 2016)
We study the fixed design segmented regression problem:
Given noisy samples from a piecewise linear function $f$,
we want to recover $f$ up to a desired accuracy in mean-squared error.
Previous rigorous approaches for this problem rely on dynamic programming (DP)
and, while sample efficient, have running time quadratic in the sample size.
As our main contribution, we provide new
sample near-linear time algorithms for the problem that --
while not being minimax optimal --
achieve a significantly better sample-time tradeoff
on large datasets compared to the DP approach.
Our experimental evaluation shows that, compared with the DP approach,
our algorithms provide a convergence rate that is only off by a factor of $2$ to $3$,
while achieving speedups of two orders of magnitude.
Properly Learning Poisson Binomial Distributions in Almost Polynomial Time
[abstract][pdf][arxiv] I. Diakonikolas, D. Kane, A. Stewart
Proceedings of the 29th Annual Conference on Learning Theory (COLT 2016)
We give an algorithm for properly learning Poisson binomial distributions.
A Poisson binomial distribution (PBD) of order $n$
is the discrete probability distribution of the sum of $n$ mutually independent Bernoulli random variables.
Given $\widetilde{O}(1/\epsilon^2)$ samples from an unknown PBD $\mathbf{P}$, our algorithm runs in time
$(1/\epsilon)^{O(\log \log (1/\epsilon))}$, and outputs a hypothesis PBD that is $\epsilon$-close to $\mathbf{P}$ in total variation distance.
The sample complexity of our algorithm is known to be nearly-optimal, up to logarithmic factors, as established
in previous work~\cite{DDS12stoc}. However, the previously best known running time for properly
learning PBDs~\cite{DDS12stoc, DKS15} was $(1/\epsilon)^{O(\log(1/\epsilon))}$, and was essentially obtained by
enumeration over an appropriate $\epsilon$-cover. We remark that the running time of this cover-based approach cannot be
improved, as any $\epsilon$-cover for the space of PBDs has size $(1/\epsilon)^{\Omega(\log(1/\epsilon))}$~\cite{DKS15}.
As one of our main contributions, we provide a novel structural characterization of PBDs,
showing that any PBD $\mathbf{P}$ is $\epsilon$-close to another PBD $\mathbf{Q}$ with $O(\log(1/\epsilon))$ distinct parameters.
More precisely, we prove that, for all $\epsilon >0,$ there exists
an explicit collection $\cal{M}$ of $(1/\epsilon)^{O(\log \log (1/\epsilon))}$ vectors of multiplicities,
such that for any PBD $\mathbf{P}$ there exists a PBD $\mathbf{Q}$ with $O(\log(1/\epsilon))$
distinct parameters whose multiplicities are given by some element of ${\cal M}$,
such that $\mathbf{Q}$ is $\epsilon$-close to $\mathbf{P}.$ Our proof combines tools from Fourier analysis and algebraic geometry.
Our approach to the proper learning problem is as follows:
Starting with an accurate non-proper hypothesis, we fit a PBD to this hypothesis.
More specifically, we essentially start with the hypothesis computed by the
computationally efficient non-proper learning algorithm in our recent work~\cite{DKS15}.
Our aforementioned structural characterization allows
us to reduce the corresponding fitting problem
to a collection of $(1/\epsilon)^{O(\log \log(1/\epsilon))}$
systems of low-degree polynomial inequalities.
We show that each such system can be solved in time $(1/\epsilon)^{O(\log \log(1/\epsilon))}$,
which yields the overall running time of our algorithm.
Optimal Learning via the Fourier Transform for Sums of Independent Integer Random Variables
[abstract][pdf][arxiv][notes] I. Diakonikolas, D. Kane, A. Stewart
Proceedings of the 29th Annual Conference on Learning Theory (COLT 2016)
We study the structure and learnability of sums of independent integer random variables (SIIRVs).
For $k \in \mathbb{Z}_{+}$, a {\em $k$-SIIRV of order $n \in \mathbb{Z}_{+}$} is the probability distribution of the sum of $n$
mutually independent random variables each supported on $\{0, 1, \dots, k-1\}$.
We denote by ${\cal S}_{n,k}$ the set of all $k$-SIIRVs of order $n$.
How many samples are required to learn an arbitrary distribution in ${\cal S}_{n,k}$?
In this paper, we tightly characterize the sample and computational complexity of this problem.
More precisely, we design a computationally efficient algorithm that uses $\widetilde{O}(k/\epsilon^2)$ samples,
and learns an arbitrary $k$-SIIRV within error $\epsilon,$ in total variation distance. Moreover, we show that
the {\em optimal} sample complexity of this learning problem is
$\Theta((k/\epsilon^2)\sqrt{\log(1/\epsilon)}),$ i.e., we prove an upper bound and a matching
information-theoretic lower bound.
Our algorithm proceeds by learning the Fourier transform of the target $k$-SIIRV in its effective support.
Its correctness relies on the {\em approximate sparsity} of the Fourier transform of $k$-SIIRVs --
a structural property that we establish, roughly stating that the Fourier transform of $k$-SIIRVs
has small magnitude outside a small set.
Along the way we prove several new structural results about $k$-SIIRVs.
As one of our main structural contributions, we give an efficient algorithm to construct a
sparse {\em proper} $\epsilon$-cover for ${\cal S}_{n,k},$ in total variation distance.
We also obtain a novel geometric characterization of the space of $k$-SIIRVs. Our
characterization allows us to prove a tight lower bound on the size of $\epsilon$-covers for ${\cal S}_{n,k}$
-- establishing that our cover upper bound is optimal -- and is the key ingredient in our tight sample complexity lower bound.
Our approach of exploiting the sparsity of the Fourier transform in
distribution learning is general, and has recently found additional applications.
In a subsequent work~\cite{DKS15c}, we use a generalization of this idea (in higher dimensions)
to obtain the first efficient learning algorithm for Poisson multinomial distributions.
In~\cite{DKS15b}, we build on this approach to obtain the fastest known proper learning algorithm
for Poisson binomial distributions ($2$-SIIRVs).
The Fourier Transform of Poisson Multinomial Distributions and its Algorithmic Applications
[abstract][pdf][arxiv] I. Diakonikolas, D. Kane, A. Stewart
Proceedings of the 48th Annual ACM Symposium on Theory of Computing (STOC 2016)
We study Poisson Multinomial Distributions -- a fundamental family of discrete distributions
that generalize the binomial and multinomial distributions, and are commonly encountered
in computer science.
Formally, an $(n, k)$-Poisson Multinomial Distribution (PMD) is a random variable
of the form $X = \sum_{i=1}^n X_i$, where the $X_i$'s are independent random vectors supported
on the set $\{e_1, e_2, \ldots, e_k \}$ of standard basis vectors in $\mathbb{R}^k$.
In this paper, we obtain a refined structural understanding of PMDs
by analyzing their Fourier transform.
As our core structural result, we prove that the Fourier transform of PMDs is \emph{approximately sparse},
i.e., roughly speaking, its $L_1$-norm is small outside a small set. By building on this result, we obtain the following
applications:
Learning Theory.
We design the first computationally efficient learning algorithm for PMDs
with respect to the total variation distance. Our algorithm learns an arbitrary $(n, k)$-PMD
within variation distance $\epsilon$ using a near-optimal sample size of $\widetilde{O}_k(1/\epsilon^2),$
and runs in time $\widetilde{O}_k(1/\epsilon^2) \cdot \log n.$ Previously, no algorithm with a $\mathrm{poly}(1/\epsilon)$
runtime was known, even for $k=3.$
Game Theory. We give the first efficient polynomial-time approximation scheme (EPTAS) for computing Nash equilibria
in anonymous games. For normalized anonymous games
with $n$ players and $k$ strategies, our algorithm computes a well-supported $\epsilon$-Nash equilibrium in time
$n^{O(k^3)} \cdot (k/\epsilon)^{O(k^3\log(k/\epsilon)/\log\log(k/\epsilon))^{k-1}}.$
The best previous algorithm for this problem~\cite{DaskalakisP08, DaskalakisP2014}
had running time $n^{(f(k)/\epsilon)^k},$ where $f(k) = \Omega(k^{k^2})$, for any $k>2.$
Statistics. We prove a multivariate central limit theorem (CLT) that relates
an arbitrary PMD to a discretized multivariate Gaussian with the same mean and covariance, in total variation distance.
Our new CLT strengthens the CLT of Valiant and Valiant~\cite{VV10b, ValiantValiant:11} by completely removing the dependence on $n$ in the error bound.
Along the way we prove several new structural results of independent interest about PMDs. These include: (i) a robust moment-matching lemma,
roughly stating that two PMDs that approximately agree on their low-degree parameter moments are close in variation distance;
(ii) near-optimal size proper $\epsilon$-covers for PMDs in total variation distance (constructive upper bound and nearly-matching lower bound).
In addition to Fourier analysis, we employ a number of analytic tools, including the saddlepoint method from complex analysis,
that may find other applications.
Testing Shape Restrictions of Discrete Distributions
[abstract][pdf] C. Canonne, I. Diakonikolas, T. Gouleakis, R. Rubinfeld
Proceedings of the 33rd International Symposium on Theoretical Aspects of Computer Science (STACS 2016)
Invited to Special Issue for STACS 2016
We study the question of testing structured properties (classes) of discrete distributions. Specifically, given
sample access to an arbitrary distribution $D$ over $[n]$ and a property $\mathcal{P}$, the goal is to distinguish
between $D\in \mathcal{P}$ and $D$ is $\epsilon$-far in $\ell_1$ distance from $\mathcal{P}$.
We develop a general algorithm for this question, which applies to a large range of "shape-constrained" properties,
including monotone, log-concave, $t$-modal, piecewise-polynomial, and Poisson Binomial distributions. Moreover, for all cases
considered, our algorithm has near-optimal sample complexity with regard to the domain size and is computationally efficient.
For most of these classes, we provide the first non-trivial tester in the literature.
In addition, we also describe a generic method to prove lower bounds for this problem, and use it to show our upper bounds are nearly tight.
Finally, we extend some of our techniques to tolerant testing, deriving nearly-tight upper and lower bounds for the corresponding questions.
Differentially Private Learning of Structured Discrete Distributions
[abstract][pdf] I. Diakonikolas, M. Hardt, L. Schmidt
Advances in Neural Information Processing Systems (NIPS 2015)
We investigate the problem of learning an unknown probability distribution
over a discrete population from random samples. Our goal is to design
efficient algorithms that simultaneously achieve low error in total variation
norm while guaranteeing Differential Privacy to the individuals of the
population.
We describe a general approach that yields near sample-optimal and computationally efficient differentially
private estimators for a wide range of well-studied and natural distribution families. Our theoretical results
show that for a wide variety of structured distributions there exist private estimation algorithms that are nearly
as efficient---both in terms of sample size and running time---as their non-private counterparts. We complement our theoretical
guarantees with an experimental evaluation. Our experiments illustrate the speed and accuracy
of our private estimators on both synthetic mixture models, as well as a large public data set.
Optimal Algorithms and Lower Bounds for Testing Closeness of Structured Distributions
[abstract][pdf] I. Diakonikolas, D. Kane, V. Nikishkin
Proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2015)
We give a general unified method that can be used for $L_1$ closeness testing of a wide range of univariate structured distribution families.
More specifically, we design a sample optimal and computationally efficient algorithm for testing
the identity of two unknown (potentially arbitrary) univariate distributions under the $\mathcal{A}_k$-distance metric:
Given sample access to distributions with density functions $p, q: I \to \mathbb{R}$, we want to distinguish
between the cases that $p=q$ and $\|p-q\|_{\mathcal{A}_k} \ge \epsilon$ with probability at least $2/3$.
We show that for any $k \ge 2, \epsilon>0$, the optimal sample complexity of the $\mathcal{A}_k$-closeness testing
problem is $\Theta(\max\{ k^{4/5}/\epsilon^{6/5}, k^{1/2}/\epsilon^2 \})$.
This is the first $o(k)$ sample algorithm for this problem, and yields
new, simple $L_1$ closeness testers, in most cases with optimal sample complexity,
for broad classes of structured distributions.
On the Complexity of Optimal Lottery Pricing and Randomized Mechanisms
[abstract][pdf] X. Chen, I. Diakonikolas, A. Orfanou, D. Paparas, X. Sun, M. Yannakakis
Proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2015)
We study the optimal lottery problem and the optimal mechanism design problem
in the setting of a single unit-demand buyer with item values drawn from independent distributions.
Optimal solutions to both problems are characterized by a linear program with exponentially many variables.
For the menu size complexity of the optimal lottery problem, we present an explicit, simple instance
with distributions of support size $2$, and show that exponentially many lotteries
are required to achieve the optimal revenue. We also show that, when distributions have support size $2$
and share the same high value, the simpler scheme of item pricing can achieve the same revenue as the optimal
menu of lotteries. The same holds for the case of two items with support size $2$
(but not necessarily the same high value).
For the computational complexity of the optimal mechanism design problem,
we show that unless the polynomial-time hierarchy collapses
(more precisely, $\mathrm{P}^{\mathrm{NP}}=\mathrm{P}^{\mathrm{\#P}}$), there is
no universal efficient randomized algorithm to implement
an optimal mechanism even when distributions have support size $3$.
Fast and Near-Optimal Algorithms for Approximating Distributions by Histograms
[abstract][pdf] J. Acharya, I. Diakonikolas, C. Hegde, J. Li, L. Schmidt
Proceedings of the 34th Annual ACM Symposium on Principles of Database Systems (PODS 2015)
Histograms are among the most popular structures for the succinct summarization of data in a variety of database applications.
In this work, we provide fast and near-optimal algorithms for approximating arbitrary one dimensional data distributions by histograms.
A $k$-histogram is a piecewise constant function with $k$ pieces.
We consider the following natural problem, previously studied by Indyk, Levi, and Rubinfeld in PODS 2012:
Given samples from a distribution $p$ over $\{1, \ldots, n \}$, compute a $k$-histogram that minimizes the $\ell_2$-distance from $p$, up to an additive $\epsilon$.
We design an algorithm for this problem that uses the information--theoretically minimal sample size of $m = O(1/\epsilon^2)$, runs in sample--linear time $O(m)$,
and outputs an $O(k)$-- histogram whose $\ell_2$-distance from $p$ is at most $O(\mathrm{opt}_k) +\epsilon$, where $\mathrm{opt}_k$ is the minimum
$\ell_2$-distance between $p$ and any $k$-histogram.
Perhaps surprisingly, the sample size and running time of our algorithm are independent of the universe size $n$.
We generalize our approach to obtain fast algorithms for multi-scale histogram construction, as well as approximation by piecewise polynomial distributions.
We experimentally demonstrate one to two orders of magnitude improvement in terms of empirical running times over previous state-of-the-art algorithms.
Testing Identity of Structured Distributions
[abstract][pdf] I. Diakonikolas, D. Kane, V. Nikishkin
Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2015)
We study the question of identity testing for structured distributions.
More precisely, given samples from a {\em structured} distribution $q$ over $[n]$ and an explicit distribution $p$ over $[n]$,
we wish to distinguish whether $q=p$ versus $q$ is at least $\epsilon$-far from $p$,
in $L_1$ distance. In this work, we present a unified approach that yields new, simple testers, with sample complexity
that is information-theoretically optimal, for broad classes of structured distributions, including $t$-flat distributions,
$t$-modal distributions, log-concave distributions, monotone hazard rate (MHR) distributions, and mixtures thereof.
Learning from Satisfying Assignments
[abstract][pdf] A. De, I. Diakonikolas, R. Servedio
Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2015)
This paper studies the problem of learning ``low-complexity"
probability distributions over the Boolean hypercube $\{-1,1\}^n$.
As in the standard PAC learning model, a learning problem in our
framework is defined by a class ${\cal C}$ of Boolean functions
over $\{-1,1\}^n$, but
in our model the learning algorithm is given uniform random satisfying
assignments of an unknown $f \in \cal{C}$ and its goal is to output
a high-accuracy approximation of the uniform distribution
over $f^{-1}(1).$ This distribution learning problem may be viewed as a
demanding variant of standard Boolean function learning, where the learning
algorithm only receives positive examples and -- more importantly --
must output a hypothesis function which has small \emph{multiplicative}
error (i.e., small error relative to the size of $f^{-1}(1)$).
As our main results, we show that the two most widely studied
classes of Boolean functions in computational learning theory --
linear threshold functions and DNF formulas -- have
efficient distribution learning algorithms in our model.
Our algorithm for linear threshold functions runs in
time poly$(n,1/\epsilon)$ and our algorithm for
polynomial-size DNF runs in time quasipoly$(n,1/\epsilon)$.
We obtain both these results via a general approach that
combines a broad range of technical ingredients, including the complexity-theoretic study of approximate counting and uniform generation;
the Statistical Query model from learning theory; and hypothesis testing techniques from statistics. A key conceptual and technical ingredient of
this approach is a new kind of algorithm which we devise called a ``densifier'' and which we
believe may be useful in other contexts.
We also establish limitations on efficient learnability in our model by showing
that the existence of certain types of cryptographic signature schemes
imply that certain learning problems in our framework are computationally
hard. Via this connection we show that
assuming the existence of sufficiently strong unique signature schemes,
there are no sub-exponential time learning algorithms in our framework for
intersections of two halfspaces, for
degree-2 polynomial threshold functions, or for monotone 2-CNF formulas.
Thus our positive results for distribution learning come close to the
limits of what can be achieved by efficient algorithms.
Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms
[abstract][pdf] S. Chan, I. Diakonikolas, R. Servedio, X. Sun
Advances in Neural Information Processing Systems (NIPS 2014)
Let $p$ be an unknown and arbitrary probability distribution over $[0,1)$. We consider the problem
of density estimation, in which a learning algorithm is given i.i.d. draws from $p$ and must
(with high probability) output a hypothesis distribution that is close to $p$. The main contribution of this paper is
a highly efficient density estimation algorithm for learning using a variable-width histogram, i.e., a
hypothesis distribution with a piecewise constant probability density function.
In more detail, for any $k$ and $\epsilon$, we give an algorithm that makes $\tilde{O}(k/\epsilon^2)$ draws from $p$, runs in $\tilde{O}(k/\epsilon^2)$ time, and outputs a
hypothesis distribution $h$ that is piecewise constant with $O(k \log^2(1/\epsilon))$ pieces. With high probability the
hypothesis $h$ satisfies $d_{\mathrm{TV}}(p,h) \leq C \cdot \mathrm{opt}_k(p) + \epsilon$, where $d_{\mathrm{TV}}$ denotes the total variation distance
(statistical distance), $C$ is a universal constant,
and $\mathrm{opt}_k(p)$ is the smallest total variation distance between $p$ and any $k$-piecewise constant distribution.
The sample size and running time of our algorithm are optimal up to logarithmic factors.
The ``approximation factor'' $C$ in our result is inherent in the problem, as we prove that
no algorithm with sample size bounded in terms of $k$ and $\epsilon$ can achieve $C<2$ regardless
of what kind of hypothesis distribution it uses.
Efficient Density Estimation via Piecewise Polynomial Approximation
[abstract][pdf] S. Chan, I. Diakonikolas, R. Servedio, X. Sun
Proceedings of the 46th Annual ACM Symposium on Theory of Computing (STOC 2014)
We give a highly efficient "semi-agnostic" algorithm for learning univariate probability distributions that are well approximated by piecewise polynomial density functions. Let $p$ be an arbitrary distribution over an interval $I$ which is $\tau$-close (in total variation distance) to an unknown probability distribution $q$ that is defined by an unknown partition of $I$ into $t$ intervals and $t$ unknown degree-$d$ polynomials specifying $q$ over each of the intervals. We give an algorithm that draws $\tilde{O}(t(d+1)/\epsilon^2)$ samples from $p$, runs in time $\mathrm{poly}(t,d,1/\epsilon)$, and with high probability outputs a piecewise polynomial hypothesis distribution $h$ that is $(O(\tau)+\epsilon)$-close (in total variation distance) to $p$. This sample complexity is essentially optimal; we show that even for $\tau=0$, any algorithm that learns an unknown $t$-piecewise degree-$d$ probability distribution over $I$ to accuracy $\epsilon$ must use $\Omega({\frac {t(d+1)} {\mathrm{poly}(1 + \log(d+1))}} \cdot {\frac 1 {\epsilon^2}})$ samples from the distribution, regardless of its running time. Our algorithm combines tools from approximation theory, uniform convergence, linear programming, and dynamic programming.
We apply this general algorithm to obtain a wide range of results for many natural problems in density estimation over both continuous and discrete domains. These include state-of-the-art results for learning mixtures of log-concave distributions; mixtures of $t$-modal distributions; mixtures of Monotone Hazard Rate distributions; mixtures of Poisson Binomial Distributions; mixtures of Gaussians; and mixtures of $k$-monotone densities. Our general technique yields computationally efficient algorithms for all these problems, in many cases with provably optimal sample complexities (up to logarithmic factors) in all parameters.
Deterministic Approximate Counting for Juntas of Degree-$2$ Polynomial Threshold Functions
[abstract][pdf] A. De, I. Diakonikolas, R. Servedio
Proceedings of the 29th Annual IEEE Conference on Computational Complexity (CCC 2014)
Let $g: \{-1,1\}^k \rightarrow \{-1,1\}$ be any Boolean function
and $q_1,\dots,q_k$ be any degree-$2$ polynomials over
$\{-1,1\}^n.$ We give a deterministic algorithm which,
given as input explicit descriptions of
$g,q_1,\dots,q_k$ and an accuracy parameter $\epsilon>0$,
approximates
\[
\mathbf{Pr}_{x \sim \{-1,1\}^n}[g(\mathrm{sign}(q_1(x)),\dots,\mathrm{sign}(q_k(x)))=1]
\]
to within an additive $\pm \epsilon$. For any constant $\epsilon > 0$
and $k \geq 1$ the running time of our algorithm is a fixed
polynomial in $n$ (in fact this is true even for some not-too-small
$\epsilon = o_n(1)$ and not-too-large $k = \omega_n(1)$).
This is the first fixed polynomial-time algorithm
that can deterministically approximately count
satisfying assignments of a natural
class of depth-$3$ Boolean circuits.
Our algorithm extends a recent result \cite{DDS13:deg2count}
which gave a deterministic
approximate counting algorithm for a single degree-$2$ polynomial
threshold function $\mathrm{sign}(q(x)),$ corresponding to the $k=1$ case of our
result. Note that even in the $k=1$ case it is NP-hard to determine
whether $\mathbf{Pr}_{x \sim \{-1,1\}^n}[\mathrm{sign}(q(x))=1]$ is nonzero,
so any sort of multiplicative approximation is almost certainly
impossible even for efficient randomized algorithms.
Our algorithm and analysis requires several novel technical ingredients
that go significantly beyond the tools required to handle the $k=1$ case
in \cite{DDS13:deg2count}. One of these
is a new multidimensional central limit theorem
for degree-$2$ polynomials in Gaussian random variables which builds
on recent Malliavin-calculus-based results from probability theory. We
use this CLT as the basis of a new decomposition technique for $k$-tuples
of degree-$2$ Gaussian polynomials and thus obtain an efficient
deterministic approximate counting
algorithm for the Gaussian distribution, i.e., an algorithm for estimating
\[
\mathbf{Pr}_{x \sim \mathcal{N}(0,1)^n}[g(\mathrm{sign}(q_1(x)),\dots,\mathrm{sign}(q_k(x)))=1].
\]
Finally, a third new ingredient
is a ``regularity lemma'' for $k$-tuples of degree-$d$
polynomial threshold functions. This generalizes both the regularity lemmas
of \cite{DSTW:10,HKM:09}
(which apply to a single degree-$d$ polynomial threshold
function) and the regularity lemma of Gopalan et al \cite{GOWZ10}
(which applies to
a $k$-tuples of linear threshold functions, i.e., the case $d=1$).
Our new regularity lemma lets us extend our deterministic approximate
counting results from the Gaussian to the Boolean domain.
The Complexity of Optimal Multidimensional Pricing
[abstract][pdf] X. Chen, I. Diakonikolas, D. Paparas, X. Sun, M. Yannakakis
Games and Economic Behavior, accepted with minor revisions
Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2014)
We resolve the complexity of revenue-optimal deterministic auctions in
the unit-demand single-buyer Bayesian setting, i.e., the optimal item
pricing problem, when the buyer's values for the items are independent.
We show that the problem of computing a revenue-optimal pricing can be solved
in polynomial time for distributions of support size $2$
and its decision version is NP-complete for distributions of support size $3$.
We also show that the problem remains NP-complete for the case of identical distributions.
Optimal Algorithms for Testing Closeness of Discrete Distributions
[abstract][pdf] S. Chan, I. Diakonikolas, G. Valiant, P. Valiant
Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2014)
Blog post about this work: Property Testing Review
We study the question of closeness testing for two discrete distributions.
More precisely, given samples from two distributions $p$ and $q$ over an $n$-element set,
we wish to distinguish whether $p=q$ versus $p$ is at least $\epsilon$-far from $q$,
in either $\ell_1$ or $\ell_2$ distance. Batu et al~\cite{BFR+:00, Batu13} gave the first sub-linear time algorithms for these problems,
which matched the lower bounds of~\cite{PV11sicomp} up to a logarithmic factor in $n$, and a polynomial factor of $\epsilon.$
In this work, we present simple testers for both the $\ell_1$ and $\ell_2$ settings, with sample complexity
that is information-theoretically optimal, to constant factors, both in the dependence on $n$, and the dependence on $\epsilon$;
for the $\ell_1$ testing problem we establish that the sample complexity is $\Theta(\max\{n^{2/3}/\epsilon^{4/3}, n^{1/2}/\epsilon^2 \}).$
A Polynomial-time Approximation Scheme for Fault-tolerant Distributed Storage
[abstract][pdf] C. Daskalakis, A. De, I. Diakonikolas, A. Moitra, R. Servedio
Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2014)
Featured in Abstract Talk. Podcast is here
We consider a problem which has received considerable attention in
systems literature because of its applications to routing in delay tolerant networks and replica placement
in distributed storage systems.
In abstract terms the problem can be stated as follows: Given a random variable $X$
generated by a known product distribution over $\{0,1\}^n$ and a target
value $0 \leq \theta \leq 1$, output a non-negative vector $w$, with
$\|w\|_1 \le 1$, which maximizes the probability of the event $w \cdot X
\ge \theta$. This is a challenging non-convex optimization problem for
which even computing the value $\Pr[w \cdot X \ge \theta]$ of a proposed
solution vector $w$ is #P-hard.
We provide an additive EPTAS for this problem
which, for constant-bounded product distributions,
runs in $ \mathrm{poly}(n) \cdot 2^{\mathrm{poly}(1/\epsilon)}$ time and outputs an
$\epsilon$-approximately optimal solution vector $w$ for this problem. Our
approach is inspired by, and extends,
recent structural results from the complexity-theoretic
study of linear threshold functions. Furthermore, in spite of the objective function being non-smooth,
we give a unicriterion PTAS while previous work for such objective functions has typically
led to a bicriterion PTAS. We believe our techniques may be applicable to get unicriterion PTAS for other non-smooth objective functions.
Learning Sums of Independent Integer Random Variables
[abstract][pdf] C. Daskalakis, I. Diakonikolas, R. O'Donnell, R. Servedio, L-Y. Tan
Proceedings of the 54th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2013)
Blog post about this work: MIT theory student blog
Let $\mathbf{S} = \mathbf{X}_1 + \cdots + \mathbf{X}_n$ be a sum of $n$ independent
integer random variables $\mathbf{X}_i$, where each $\mathbf{X}_i$ is supported on
$\{0,1,\dots,k-1\}$ but otherwise may have an arbitrary distribution
(in particular the $\mathbf{X}_i$'s need not be identically distributed).
How many samples are required to learn the distribution $\mathbf{S}$ to high
accuracy? In this paper we show
that the answer is completely independent of $n$, and moreover we give a
computationally efficient algorithm which achieves this low sample
complexity. More precisely, our algorithm learns any such $\mathbf{S}$ to $\epsilon$-accuracy (with respect
to the total variation distance between distributions)
using $\mathrm{poly}(k,1/\epsilon)$ samples, independent of $n$. Its running time is
$\mathrm{poly}(k,1/\epsilon)$ in the standard word RAM model. Thus we give
a broad generalization of the main result of~\cite{DDS12stoc}
which gave a similar learning result for the special case $k=2$ (when
the distribution~$\mathbf{S}$ is a Poisson Binomial Distribution).
Prior to this work, no nontrivial results were
known for learning these distributions even in the case $k=3$.
A key difficulty is that, in contrast to the case of $k = 2$,
sums of independent $\{0,1,2\}$-valued random variables may
behave very differently from
(discretized) normal distributions, and in fact may be
rather complicated --- they are not log-concave, they can
be $\Theta(n)$-modal, there is no relationship between
Kolmogorov distance and total variation distance for the class, etc.
Nevertheless, the heart of our learning result is a new limit theorem
which characterizes what the sum of an arbitrary number of arbitrary
independent $\{0,1,\dots, k-1\}$-valued random variables may look like.
Previous limit theorems in this setting made strong assumptions on the
"shift invariance" of the random variables $\mathbf{X}_i$ in order to
force a discretized normal limit. We believe that our new
limit theorem, as the first result for truly arbitrary sums of
independent $\{0,1,\dots,k-1\}$-valued random variables,
is of independent interest.
Deterministic Approximate Counting for Degree-$2$ Polynomial Threshold Functions
[abstract][pdf] A. De, I. Diakonikolas, R. Servedio
Manuscript, 2013. Available as ECCC technical report [link]
We give a deterministic algorithm for
approximately computing the fraction of Boolean assignments
that satisfy a degree-$2$ polynomial threshold function.
Given a degree-$2$ input polynomial $p(x_1,\dots,x_n)$
and a parameter $\epsilon > 0$, the algorithm approximates
$\Pr_{x \sim \{-1,1\}^n}[p(x) \geq 0]$
to within an additive $\pm \epsilon$ in time $\mathrm{poly}(n,2^{\mathrm{poly}(1/\epsilon)})$.
Note that it is NP-hard to determine whether the above probability
is nonzero, so any sort of multiplicative approximation is almost certainly
impossible even for efficient randomized algorithms.
This is the first deterministic algorithm for this counting problem
in which the running time is polynomial in $n$ for $\epsilon= o(1)$.
For "regular" polynomials $p$ (those in which no individual variable's
influence is large compared to the sum of all $n$
variable influences)
our algorithm runs in $\mathrm{poly}(n,1/\epsilon)$ time.
The algorithm also runs in $\mathrm{poly}(n,1/\epsilon)$ time to approximate
$\Pr_{x \sim \mathcal{N}(0,1)^n}[p(x) \geq 0]$ to within an additive $\pm \epsilon$,
for any degree-2 polynomial $p$.
As an application of our counting result, we give a deterministic
multiplicative $(1 \pm \epsilon)$-approximation algorithm
to approximate the $k$-th absolute moment $\mathbf{E}_{x \sim \{-1,1\}^n}[|p(x)^k|]$
of a degree-$2$ polynomial. The algorithm runs in fixed
polynomial time for any constants $k$ and $\epsilon.$
A robust Khintchine Inequality and computing optimal constants in Fourier analysis and high-dimensional geometry
[abstract][pdf] A. De, I. Diakonikolas, R. Servedio
SIAM Journal on Discrete Mathematics, 30-2 (2016), pp. 1058-1094
Proceedings of the 40th Intl. Colloquium on Automata, Languages and Programming (ICALP 2013)
This paper makes two contributions towards determining some well-studied
optimal constants in Fourier analysis of Boolean functions
and high-dimensional geometry.
(1) It has been known since 1994 \cite{GL:94} that every linear threshold function has squared Fourier mass
at least $1/2$ on its degree-$0$ and degree-$1$ coefficients.
Denote the minimum such Fourier mass by $\mathbf{W}^{\leq 1}[\mathbf{LTF}]$,
where the minimum is taken over all $n$-variable linear threshold functions and all $n \ge 0$.
Benjamini, Kalai and Schramm \cite{BKS:99}
have conjectured that the true value of $\mathbf{W}^{\leq 1}[\mathbf{LTF}]$ is $2/\pi$.
We make progress on this conjecture by proving that $\mathbf{W}^{\leq 1}[\mathbf{LTF}]
\geq 1/2 + c$ for some absolute constant $c>0$.
The key ingredient in our proof is a "robust" version of the well-known
Khintchine inequality in functional analysis, which we
believe may be of independent interest.
(2) We give an algorithm with the following property: given any $\eta > 0$,
the algorithm runs in time $2^{\mathrm{poly}(1/\eta)}$ and determines the value of
$\mathbf{W}^{\leq 1}[\mathbf{LTF}]$ up to an additive error of $\pm\eta$. We give a similar
$2^{{\mathrm{poly}(1/\eta)}}$-time algorithm to determine Tomaszewski's constant
to within an additive error of $\pm \eta$; this is the
minimum (over all origin-centered hyperplanes $H$) fraction of points
in $\{-1,1\}^n$ that lie within Euclidean distance $1$ of $H$.
Tomaszewski's constant is conjectured to be $1/2$; lower bounds on it
have been given by Holzman and Kleitman \cite{HK92} and
independently by Ben-Tal, Nemirovski and Roos
\cite{BNR02}.
Our algorithms combine tools from anti-concentration
of sums of independent random variables, Fourier analysis, and Hermite
analysis of linear threshold functions.
Learning Mixtures of Structured Distributions over Discrete Domains
[abstract][pdf] S. Chan, I. Diakonikolas, R. Servedio, X. Sun
Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2013)
Let $\mathfrak{C}$ be a class of probability distributions over the discrete domain $[n] = \{1,\dots,n\}.$
We show that if $\mathfrak{C}$ satisfies a rather general condition -- essentially, that each distribution in
$\mathfrak{C}$ can be well-approximated by a variable-width
histogram with few bins -- then there is a highly efficient (both in terms of running time and sample complexity)
algorithm that can learn any mixture of $k$ unknown distributions from
$\mathfrak{C}.$
We analyze several natural types of distributions over $[n]$,
including log-concave, monotone hazard rate and unimodal distributions,
and show that they have the required structural property of being
well-approximated by a histogram with few bins.
Applying our general algorithm, we
obtain near-optimally efficient algorithms for all these mixture
learning problems as described below. More precisely,
Log-concave distributions: We learn any mixture of $k$
log-concave distributions over $[n]$ using $k \cdot
\tilde{O}(1/\epsilon^4)$ samples (independent of $n$) and running in time
$\tilde{O}(k \log(n) / \epsilon^4)$ bit-operations (note that reading a single
sample from $[n]$ takes $\Theta(\log n)$ bit operations).
For the special case $k=1$ we give an efficient
algorithm using $\tilde{O}(1/\epsilon^3)$
samples; this generalizes the main result of \cite{DDS12stoc} from the
class of Poisson Binomial distributions to the much broader class of all
log-concave distributions. Our upper bounds are not far from
optimal since any algorithm for this learning problem requires
$\Omega(k/\epsilon^{5/2})$ samples.
Monotone hazard rate (MHR) distributions:
We learn any mixture of $k$ MHR distributions over $[n]$ using
$O(k \log (n/\epsilon)/\epsilon^4)$ samples and running in time $\tilde{O}(k
\log^2(n) / \epsilon^4)$ bit-operations. Any algorithm for this learning problem must use $\Omega(k \log(n)/\epsilon^3)$ samples.
Unimodal distributions:
We give an algorithm that learns any mixture of $k$ unimodal distributions
over $[n]$ using $O(k \log (n)/\epsilon^{4})$ samples and running in time
$\tilde{O}(k \log^2(n) / \epsilon^{4})$ bit-operations.
Any algorithm for this problem must use $\Omega(k \log(n)/\epsilon^3)$ samples.
Testing $k$-modal Distributions: Optimal Algorithms via Reductions
[abstract][pdf] C. Daskalakis, I. Diakonikolas, R. Servedio, G. Valiant, P. Valiant
Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2013)
We give highly efficient algorithms, and almost matching lower bounds, for a range of basic statistical problems
that involve testing and estimating the $L_1$ (total variation) distance between two $k$-modal distributions $p$ and $q$ over the discrete domain $\{1,\dots,n\}$.
More precisely, we consider the following four problems: given sample access to an unknown $k$-modal
distribution $p$,
Testing identity to a known or unknown distribution:
(1) Determine whether $p = q$ (for an explicitly given $k$-modal distribution $q$) versus
$p$ is $\epsilon$-far from $q$;
(2) Determine whether $p=q$ (where $q$ is available via sample access) versus
$p$ is $\epsilon$-far from $q$;
Estimating $L_1$ distance (``tolerant testing'') against a known or unknown distribution:
(3) Approximate $d_{TV}(p,q)$ to within additive $\epsilon$ where $q$ is an explicitly
given $k$-modal distribution $q$;
(4)Approximate $d_{TV}(p,q)$ to within additive $\epsilon$ where $q$ is available via sample access.
For each of these four problems we give sub-logarithmic sample algorithms, that we show are tight up to additive $\mathrm{poly}(k)$
and multiplicative $\mathrm{polylog}\log n+\mathrm{polylog} k$ factors.
Thus our bounds significantly improve the previous results of \cite{BKR:04}, which were for testing identity of distributions (items (1) and (2) above) in the special cases
$k=0$ (monotone distributions) and $k=1$ (unimodal distributions) and required $O((\log n)^3)$ samples.
An Optimal Algorithm for the Efficient Approximation of Convex Pareto Curves
I. Diakonikolas, M. Yannakakis
Manuscript, 2012
On the relation of total variation and Kolmogorov distance between Poisson Binomial distributions
C. Daskalakis, I. Diakonikolas, R. Servedio
Manuscript, 2012
Efficiency-Revenue Tradeoffs in Auctions
[abstract][pdf] I. Diakonikolas, C.H. Papadimitriou, G. Pierrakos, Y. Singer
Proceedings of the 39th Intl. Colloquium on Automata, Languages and Programming (ICALP 2012)
When agents with independent priors bid for a single item, Myerson's optimal auction maximizes expected revenue, whereas Vickrey's second-price auction optimizes social welfare. We address the natural question of trade-offs between the two criteria, that is, auctions that optimize, say, revenue under the constraint that the welfare is above a given level. If one allows for randomized mechanisms, it is easy to see that there are polynomial-time mechanisms that achieve any point in the trade-off (the Pareto curve) between revenue and welfare. We investigate whether one can achieve the same guarantees using deterministic mechanisms. We provide a negative answer to this question by showing that this is a (weakly) NP-hard problem. On the positive side, we provide polynomial-time deterministic mechanisms that approximate with arbitrary precision any point of the trade-off between these two fundamental objectives for the case of two bidders, even when the valuations are correlated arbitrarily. The major problem left open by our work is whether there is such an algorithm for three or more bidders with independent valuation distributions.
The Inverse Shapley Value Problem
[abstract][pdf] A. De, I. Diakonikolas, R. Servedio
Games and Economic Behavior, accepted with minor revisions
Proceedings of the 39th Intl. Colloquium on Automata, Languages and Programming (ICALP 2012)
For $f$ a weighted voting scheme used by $n$ voters to choose between two
candidates, the $n$ Shapley-Shubik Indices (or Shapley values)
of $f$ provide a measure of how
much control each voter can exert over the overall outcome of the vote.
Shapley-Shubik indices were introduced by Lloyd Shapley
and Martin Shubik in 1954 \cite{SS54} and are widely studied in
social choice theory as a measure of the "influence" of voters.
The Inverse Shapley Value Problem is the problem of designing a weighted
voting scheme which (approximately) achieves a desired input vector of
values for the Shapley-Shubik indices. Despite much interest in this problem
no provably correct and efficient algorithm was known prior to our work.
We give the first efficient algorithm with provable performance guarantees for
the Inverse Shapley Value Problem. For any constant $\epsilon > 0$
our algorithm runs in fixed poly$(n)$ time (the degree of the
polynomial is independent of $\epsilon$) and has the following
performance guarantee: given as input a vector of desired Shapley values,
if any "reasonable" weighted voting scheme
(roughly, one in which the threshold is not too skewed)
approximately matches the desired vector of values to within
some small error,
then our algorithm explicitly outputs a weighted voting scheme that
achieves this vector of Shapley values to within error $\epsilon.$
If there is a "reasonable" voting scheme in which all
voting weights are integers at most $\mathrm{poly}(n)$ that approximately achieves
the desired Shapley values, then our algorithm runs in time
$\mathrm{poly}(n)$ and outputs a weighted voting scheme that achieves
the target vector of Shapley values to within
error $\epsilon=n^{-1/8}.$
Nearly optimal solutions for the Chow Parameters Problem and low-weight approximation of halfspaces
[abstract][pdf] A. De, I. Diakonikolas, V. Feldman, R. Servedio
Journal of the ACM, 61(2), 2014. Invited to Theory of Computing special issue on Analysis of Boolean functions (declined)
Proceedings of the 44th Annual ACM Symposium on Theory of Computing (STOC 2012)
IBM Research 2014 Pat Goldberg Math/CS/EE Best Paper Award
The Chow parameters of a Boolean function $f: \{-1,1\}^n \to \{-1,1\}$ are its $n+1$ degree-$0$ and
degree-$1$ Fourier coefficients. It has been known since 1961 \cite{Chow:61, Tannenbaum:61} that the (exact values of the) Chow parameters of
any linear threshold function $f$ uniquely specify $f$ within the space of all Boolean functions, but until
recently \cite{OS11:chow} nothing was known about efficient algorithms for reconstructing $f$
(exactly or approximately) from exact or approximate values of its Chow parameters. We refer to this reconstruction problem as the Chow Parameters Problem.
Our main result is a new algorithm for the Chow Parameters Problem which,
given (sufficiently accurate approximations to) the Chow parameters of any linear threshold function $f$, runs in time
$\tilde{O}(n^2)\cdot (1/\epsilon)^{O(\log^2(1/\epsilon))}$ and
with high probability outputs a representation of an LTF $f'$ that is $\epsilon$-close to $f$ in Hamming distance.
The only previous algorithm \cite{OS11:chow} had running time $\mathrm{poly}(n) \cdot 2^{2^{\tilde{O}(1/\epsilon^2)}}.$
As a byproduct of our approach, we show that for any linear threshold function $f$ over $\{-1,1\}^n$,
there is a linear threshold function $f'$ which is $\epsilon$-close to $f$ and has all weights that are integers of magnitude at most $\sqrt{n} \cdot (1/\epsilon)^{O(\log^2(1/\epsilon))}$.
This significantly improves the previous best result of~\cite{DiakonikolasServedio:09} which gave
a $\mathrm{poly}(n) \cdot 2^{\tilde{O}(1/\epsilon^{2/3})}$ weight bound, and is close to the
known lower bound of
$\max\{\sqrt{n},$ $(1/\epsilon)^{\Omega(\log \log (1/\epsilon))}\}$ \cite{Goldberg:06b,Servedio:07cc}.
Our techniques also yield improved algorithms for related problems in learning theory.
In addition to being significantly stronger than previous work, our results
are obtained using conceptually simpler proofs.
The two main ingredients underlying our results are (1) a new structural
result showing that for $f$ any linear threshold function and $g$ any bounded
function, if the Chow parameters of $f$ are close to the Chow
parameters of $g$ then $f$ is close to $g$; (2) a new boosting-like algorithm
that given approximations to the Chow parameters of a linear threshold function outputs a bounded
function whose Chow parameters are close to those of $f$.
Learning Poisson Binomial distributions
[abstract][pdf] C. Daskalakis, I. Diakonikolas, R. Servedio
Invited to Algorithmica special issue on New Theoretical Challenges in Machine Learning Proceedings of the 44th Annual ACM Symposium on Theory of Computing (STOC 2012)
We consider a basic problem in unsupervised learning:
learning an unknown Poisson Binomial Distribution.
A Poisson Binomial Distribution (PBD) over $\{0,1,\dots,n\}$
is the distribution of a sum of $n$ independent Bernoulli
random variables which may have arbitrary, potentially non-equal,
expectations. These distributions were first studied by S. Poisson in 1837 \cite{Poisson:37} and are a natural
$n$-parameter generalization of the familiar Binomial Distribution.
Surprisingly, prior to our work this basic learning problem
was poorly understood, and known results for it were far from
optimal.
We essentially settle the complexity of the learning problem for this
basic class of distributions.
As our main result we give a highly efficient algorithm which learns to
$\epsilon$-accuracy using $\tilde{O}(1/\epsilon^3)$ samples independent of $n$.
The running time of the algorithm is quasilinear in the
size of its input data. This is nearly optimal
since any algorithm must use $\Omega(1/\epsilon^2)$ samples.
We also give positive and negative results for some extensions of this learning problem.
Learning $k$-modal distributions via testing
[abstract][pdf] C. Daskalakis, I. Diakonikolas, R. Servedio
Theory of Computing, 10 (20), 535-570 (2014)
Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2012)
Oded Goldreich's Choices #72
A $k$-modal probability distribution over the domain $\{1,...,n\}$ is one whose
histogram has at most $k$ "peaks" and "valleys." Such distributions are
natural generalizations of monotone ($k=0$) and unimodal ($k=1$)
probability distributions, which have been intensively studied in probability theory and statistics.
In this paper we consider the problem of learning an unknown $k$-modal distribution.
The learning algorithm is given access to independent samples drawn from the $k$-modal
distribution $p$, and must output a hypothesis distribution $\hat{p}$ such that with high
probability the total variation distance between $p$ and $\hat{p}$ is at most $\epsilon.$
We give an efficient algorithm for this problem that runs in time $\mathrm{poly}(k,\log(n),1/\epsilon)$.
For $k \leq \tilde{O}(\sqrt{\log n})$, the number of samples used by our algorithm is very close (within an
$\tilde{O}(\log(1/\epsilon))$ factor) to being information-theoretically optimal. Prior to this
work computationally efficient algorithms were known only for the cases $k=0,1$
\cite{Birge:87b,Birge:97}.
A novel feature of our approach is that our learning algorithm crucially uses a new property testing algorithm as a key subroutine.
The learning algorithm uses the property tester to efficiently
decompose the $k$-modal distribution into $k$ (near)-monotone distributions, which are easier to
learn.
Noise Stable Halfspaces are Close to Very Small Juntas
[abstract][pdf] I. Diakonikolas, R. Jaiswal, R. Servedio, L.-Y.Tan, A. Wan
Chicago Journal of Theoretical Computer Science, 2015
Bourgain~\cite{Bourgain:02} showed that any noise stable Boolean function $f$
can be well-approximated by a junta.
In this note we give an exponential sharpening of the parameters of
Bourgain's result under the additional assumption that $f$ is a halfspace.
Supervised Design Space Exploration by Compositional Approximation of Pareto sets
[abstract][pdf] H.-Y. Liu, I. Diakonikolas, M. Petracca, L.P. Carloni
Proceedings of the 48th Design Automation Conference (DAC 2011)
Technology scaling allows the integration of billions of transistors on the same
die but CAD tools struggle in keeping up with the increasing design complexity.
Design productivity for multi-core SoCs increasingly depends on
creating and maintaining reusable components and hierarchically
combining them to form larger composite cores.
Characterizing such composite cores with respect to their power/performance
trade-offs is critical for design reuse across various products and relies
heavily on synthesis tools.
We present $\mathrm{CAPS}$, an online adaptive algorithm that efficiently
explores the design space of any given core and returns an accurate
characterization of its implementation trade-offs in terms of an approximate
Pareto set.
It does so by supervising the order of the time-consuming logic-synthesis runs
on the core's components.
Our algorithm can provably achieve the desired precision on the approximation in
the shortest possible time, without having any a-priori information on any
component.
We also show that, in practice, $\mathrm{CAPS}$ works even better than what is guaranteed
by the theory.
Disjoint-Path Facility Location: Theory and Practice
[abstract][pdf] L. Breslau, I. Diakonikolas, N. Duffield, Y.Gu, M.T. Hajiaghayi, D.S. Johnson, H. Karloff, M. Resende, S.Sen
Proceedings of the 13th Workshop on Algorithm Engineering and Experiments (ALENEX 2011)
This paper is a theoretical and experimental study of two related facility
location problems that emanated from networking. Suppose we are given a
network modeled as a directed graph $G = (V,A)$, together with
(not-necessarily-disjoint) subsets $C$ and $F$ of $V$ , where $C$ is a set of
customer locations and $F$ is a set of potential facility locations (and
typically $C\subseteq F$). Our goal is to find a minimum sized subset $F' \subseteq F$
such that for every customer $c \in C$ there are two locations $f_1, f_2 \in F'$
such that traffic from $c$ to $f_1$ and to $f_2$ is routed on disjoint paths
(usually shortest paths) under the network's routing protocols.
Although we prove that this problem is impossible to approximate in the
worst case even to within a factor of $2^{\log^{1-\epsilon} n}$ for any $\epsilon>0$
(assuming no NP-complete language can be solved in quasi-polynomial
time), we show that the situation is much better in practice. We
propose three algorithms that build solutions and determine lower
bounds on the optimum solution, and evaluate them on several large real
ISP topologies and on synthetic networks designed to reflect real-world
LAN/WAN network structure. Our main algorithms are (1) an algorithm
that performs multiple runs of a straightforward randomized greedy
heuristic and returns the best result found, (2) a genetic
algorithm that uses the greedy algorithm as a subroutine, and (3) a new
"Double Hitting Set" algorithm. All three approaches perform surprising
well, although, in practice, the most cost-effective approach is the
multirun greedy algorithm. This yields results that average within 0.7%
of optimal for our synthetic instances and within 2.9% for our
real-world instances, excluding the largest (and most realistic) one.
For the latter instance, the other two algorithms come into their own,
finding solutions that are more than three times better than those of
the multi-start greedy approach. In terms of our motivating monitoring
application, where every customer location can be a facility location,
the results are even better. Here the above Double Hitting Set solution
is 90% better than the default solution which places a monitor at each
customer location. Our results also show that, on
average for our real-world instances, we could save an additional 18%
by choosing the (shortest path) routes ourselves, rather than taking
the simpler approach of relying on the network to choose them for us.
Hardness Results for Agnostically Learning Low-Degree Polynomial Threshold Functions
[abstract][pdf] I. Diakonikolas, R. O'Donnell, R. Servedio, Y.Wu
Proceedings of the 22nd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2011)
Hardness results for maximum agreement problems have close connections to hardness results for
proper learning in computational learning theory.
In this paper we prove two hardness results for the problem of finding a low degree polynomial threshold function (PTF)
which has the maximum possible agreement with a given set of labeled examples in $\mathbf{R}^n \times \{-1,1\}.$
We prove that for any constants $d\geq 1, \epsilon > 0$,
(1) Assuming the Unique Games Conjecture, no polynomial-time algorithm can find a degree-$d$ PTF that
is consistent with a $(1/2 + \epsilon)$ fraction of a given set of labeled examples in $\mathbf{R}^n \times \{-1,1\}$,
even if there exists a degree-$d$ PTF that is consistent with a $1-\epsilon$ fraction of the examples.
(2) It is NP-hard to find a degree-$2$ PTF that is consistent with
a $(1/2 + \epsilon)$ fraction of a given set of labeled examples in $\mathbf{R}^n \times \{-1,1\}$, even if
there exists a halfspace (degree-$1$ PTF) that is consistent with a $1 - \epsilon$ fraction of the
examples.
These results immediately imply the following hardness of learning results: (i) Assuming the
Unique Games Conjecture, there is no better-than-trivial proper learning algorithm that agnostically learns degree-$d$ PTFs under arbitrary distributions;
(ii) There is no better-than-trivial learning algorithm that outputs degree-$2$ PTFs and agnostically learns halfspaces (i.e., degree-$1$ PTFs) under arbitrary distributions.
Bounded Independence Fools Degree-$2$ Threshold Functions
[abstract][pdf] I. Diakonikolas, D. Kane, J. Nelson
Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS 2010)
Let $x$ be a random vector coming from any $k$-wise independent distribution over $\{-1,1\}^n$.
For an $n$-variate degree-$2$ polynomial $p$, we prove that $\mathbf{E}[\mathrm{sgn}(p(x))]$ is determined up to an additive $\epsilon$ for $k =
\mathrm{poly}(1/\epsilon)$. This gives a large class of explicit pseudo-random generators against such functions and answers an open
question of Diakonikolas et al. (FOCS 2009).
In the process, we develop a novel analytic technique we dub multivariate FT-mollification. This
provides a generic tool to approximate bounded (multivariate) functions by low-degree polynomials (with respect to
several different notions of approximation). A univariate version of the method was introduced by Kane et al. (SODA 2010) in
the context of streaming algorithms. In this work, we refine it and generalize it to the multivariate setting. We believe that
our technique is of independent mathematical interest. To illustrate its generality, we note that it implies a multidimensional
generalization of Jackson's classical result in approximation theory due to (Newman and Shapiro, 1963).
To obtain our main result, we combine the FT-mollification technique with several linear algebraic and probabilistic
tools. These include the invariance principle of of Mossell, O'Donnell and Oleszkiewicz, anti-concentration bounds for
low-degree polynomials, an appropriate decomposition of degree-$2$ polynomials, and a generalized hyper-contractive inequality
for quadratic forms which takes the operator norm of the associated matrix into account. Our analysis is quite modular; it
readily adapts to show that intersections of halfspaces and degree-$2$ threshold functions are fooled by bounded independence.
From this it follows that $\Omega(1/\epsilon^2)$-wise independence derandomizes the Goemans-Williamson hyperplane rounding scheme.
Our techniques unify, simplify, and in some cases improve several recent results in the literature concerning threshold
functions. For the case of "regular" halfspaces we give a simple proof of an optimal independence bound of
$\Theta(1/\epsilon^2)$, improving upon Diakonikolas et al. (FOCS 2009) by polylogarithmic factors. This yields the first optimal
derandomization of the Berry-Esseen theorem and -- combined with the results of Kalai et al. (FOCS 2005) -- implies a
faster algorithm for the problem of agnostically learning halfspaces.
Average Sensitivity and Noise Sensitivity of Polynomial Threshold Functions
[abstract][pdf] I. Diakonikolas, P. Raghavendra, R. Servedio, L.-Y. Tan
SIAM Journal on Computing, 43(1), 231-253 (2014)
Proceedings of the 42nd Annual ACM Symposium on Theory of Computing (STOC 2010)
(Conference version merged with this paper by Harsha, Klivans and Meka)
We give the first non-trivial upper bounds on the Boolean average
sensitivity and noise sensitivity of degree-$d$ polynomial threshold
functions (PTFs). Our bound on the Boolean average sensitivity of PTFs represents the first progress
towards the resolution of a conjecture of Gotsman and Linial \cite{GL:94}, which states that the symmetric function slicing the
middle $d$ layers of the Boolean hypercube has the highest average
sensitivity of all degree-$d$ PTFs. Via the $L_1$ polynomial
regression algorithm of Kalai et al. \cite{KKMS:08}, our bound on
Boolean noise sensitivity yields the first polynomial-time
agnostic learning algorithm for the broad class of constant-degree
PTFs under the uniform distribution.
To obtain our bound on the Boolean average sensitivity of PTFs,
we generalize the "critical-index" machinery of \cite{Servedio:07cc}
(which in that work applies to halfspaces, i.e., degree-$1$ PTFs) to general PTFs.
Together with the "invariance principle" of \cite{MOO10},
this allows us to essentially reduce the Boolean setting
to the Gaussian setting. The main ingredients used to obtain our bound
in the Gaussian setting are tail bounds and anti-concentration bounds on
low-degree polynomials in Gaussian random variables
\cite{Janson:97,CW:01}. Our bound on Boolean noise sensitivity is achieved
via a simple reduction from upper bounds on average sensitivity of Boolean
PTFs to corresponding bounds on noise sensitivity.
A Regularity Lemma, and Low-weight Approximators, for low-degree Polynomial Threshold Functions
[abstract][pdf] I. Diakonikolas, R. Servedio, L.-Y. Tan, A. Wan
Theory of Computing, 10(2), 27-53 (2014)
Proceedings of the 25th Annual IEEE Conference on Computational Complexity (CCC 2010)
We give a "regularity lemma" for degree-$d$ polynomial threshold
functions (PTFs) over the Boolean cube $\{-1,1\}^n$. Roughly
speaking, this result shows that every degree-$d$ PTF can be
decomposed into a constant number of subfunctions such that almost
all of the subfunctions are close to being regular PTFs. Here a "regular" PTF
is a PTF $\mathrm{sign}(p(x))$ where the influence of each variable on the
polynomial $p(x)$ is a small fraction of the total influence of $p.$
As an application of this regularity lemma, we prove that for any constants $d \geq 1, \epsilon > 0$, every degree-$d$ PTF over $n$
variables can be approximated to accuracy $\epsilon$ by a constant-degree PTF that has integer weights of total magnitude $O_{\epsilon,d}(n^d).$
This weight bound is shown to be optimal up to logarithmic factors.
How Good is the Chord Algorithm?
[abstract][pdf] C. Daskalakis, I. Diakonikolas, M. Yannakakis
SIAM Journal on Computing, 45(3), pp. 811-858, 2016
Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2010)
The Chord algorithm is a popular, simple method for the succinct approximation of curves,
which is widely used, under different names, in a variety of areas, such as,
multiobjective and parametric optimization, computational geometry, and graphics.
We analyze the performance of the Chord algorithm, as compared to the
optimal approximation that achieves a desired accuracy with the minimum number of points.
We prove sharp upper and lower bounds, both in the worst case and average case setting.
Bounded Independence Fools Halfspaces
[abstract][pdf] I. Diakonikolas, P. Gopalan, R. Jaiswal, R. Servedio, E. Viola
SIAM Journal on Computing, 39(8), 3441-3462 (2010)
Proceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2009)
We show that any distribution on $\{-1, 1\}^n$ that is $k$-wise
independent fools any halfspace (a.k.a. linear threshold function) $h : \{-1, 1\}^n \to \{-1, 1\}$, i.e.,
any function of the form $h(x) = \mathrm{sign}(\sum_{i = 1}^n w_i x_i - \theta)$ where the $w_1,\ldots,w_n,\theta$ are arbitrary real
numbers, with error $\epsilon$ for $k = O(\epsilon^{-2}\log^2(1/\epsilon))$. Our result is tight up
to $\log(1/\epsilon)$ factors. Using standard constructions of $k$-wise independent distributions, we obtain the first
explicit pseudorandom generators $G : \{-1, 1\}^s \to \{-1, 1\}^n$ that fool halfspaces.
Specifically, we fool halfspaces with error $\epsilon$ and
seed length $s = k \cdot \log n = O(\log n \cdot \epsilon^{-2} \log^2(1/\epsilon))$.
Our approach combines classical tools from real approximation theory with
structural results on halfspaces by Servedio (Comput. Complexity 2007).
Improved Approximation of Linear Threshold Functions
[abstract][pdf] I. Diakonikolas, R. Servedio
Computational Complexity, 22(3), 623-677 (2013)
Proceedings of the 24th Annual IEEE Conference on Computational Complexity (CCC 2009)
We prove two main results on how arbitrary linear threshold
functions $f(x) = \mathrm{sign}(w\cdot x - \theta)$ over the $n$-dimensional
Boolean hypercube can be approximated by simple threshold functions.
Our first result shows that every $n$-variable threshold function
$f$ is $\epsilon$-close to a threshold function depending only on
$\mathrm{Inf}(f)^2 \cdot \mathrm{poly}(1/\epsilon)$ many variables, where $\mathrm{Inf}(f)$
denotes the total influence or average sensitivity of $f.$ This is
an exponential sharpening of Friedgut's well-known theorem
\cite{Friedgut:98}, which states that every Boolean function $f$ is
$\epsilon$-close to a function depending only on $2^{O(\mathrm{Inf}(f)/\epsilon)}$
many variables, for the case of threshold functions. We complement
this upper bound by showing that $\Omega(\mathrm{Inf}(f)^2 + 1/\epsilon^2)$
many variables are required for $\epsilon$-approximating threshold
functions.
Our second result is a proof that every $n$-variable threshold
function is $\epsilon$-close to a threshold function with integer
weights at most $\mathrm{poly}(n) \cdot 2^{\tilde{O}(1/\epsilon^{2/3})}.$ This
is an improvement, in the dependence on the error
parameter $\epsilon$, on an earlier result of \cite{Servedio:07cc} which
gave a $\mathrm{poly}(n) \cdot 2^{\tilde{O}(1/\epsilon^{2})}$ bound. Our
improvement is obtained via a new proof technique that uses strong
anti-concentration bounds from probability theory. The new technique
also gives a simple and modular proof of the original
\cite{Servedio:07cc} result, and extends to give low-weight
approximators for threshold functions under a range of probability
distributions other than the uniform distribution.
Efficiently Testing Sparse $GF(2)$ Polynomials
[abstract][pdf] I. Diakonikolas, H. Lee, K. Matulef, R. Servedio, A. Wan
Algorithmica, 61(3), 580-605 (2011)
Proceedings of the 35th Intl. Colloquium on Automata, Languages and Programming (ICALP 2008)
We give the first algorithm that is both query-efficient and
time-efficient for testing whether an unknown function $f: \{0,1\}^n \to
\{0,1\}$ is an $s$-sparse $GF(2)$ polynomial versus $\epsilon$-far from
every such polynomial. Our algorithm makes $\mathrm{poly}(s,1/\epsilon)$
black-box queries to $f$ and runs in time $n \cdot \mathrm{poly}(s,1/\epsilon)$.
The only previous algorithm for this testing problem \cite{DLM+:07}
used $\mathrm{poly}(s,1/\epsilon)$ queries, but had running time exponential in
$s$ and super-polynomial in $1/\epsilon$.
Our approach significantly extends the "testing by implicit
learning" methodology of \cite{DLM+:07}. The learning component of
that earlier work was a brute-force exhaustive search over a concept
class to find a hypothesis consistent with a sample of random
examples. In this work, the learning component is a
sophisticated exact learning algorithm for sparse $GF(2)$
polynomials due to Schapire and Sellie \cite{SchapireSellie:96}. A
crucial element of this work, which enables us to simulate the
membership queries required by \cite{SchapireSellie:96}, is an
analysis establishing new properties of how sparse $GF(2)$
polynomials simplify under certain restrictions of "low-influence"
sets of variables.
Succinct Approximate Convex Pareto Curves
[abstract][pdf] I. Diakonikolas, M. Yannakakis
Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2008)
We study the succinct approximation of convex Pareto curves of multiobjective optimization problems.
We propose the concept of $\epsilon$-convex Pareto ($\epsilon$-CP) set as the appropriate one for the convex setting, and
observe that it can offer arbitrarily more compact representations than $\epsilon$-Pareto sets in this context. We characterize
when an $\epsilon$-CP can be constructed in polynomial time in terms of an efficient routine $\textrm{Comb}$ for optimizing
(exactly or approximately) monotone linear combinations of the objectives. We investigate the problem of computing minimum size
$\epsilon$-convex Pareto sets, both for discrete (combinatorial) and continuous (convex) problems, and present general
algorithms using a $\textrm{Comb}$ routine. For bi-objective problems, we show that if we have an exact $\textrm{Comb}$
optimization routine, then we can compute the minimum $\epsilon$-CP for continuous problems (this applies for example to
bi-objective Linear Programming and Markov Decision Processes), and factor 2 approximation to the minimum $\epsilon$-CP for
discrete problems (this applies for example to bi-objective versions of polynomial-time solvable combinatorial problems such as
Shortest Paths, Spanning Tree, etc.). If we have an approximate $\textrm{Comb}$ routine, then we can compute factor 3 and 6
approximations respectively to the minimum $\epsilon$-CP for continuous and discrete bi-objective problems.
We consider also the case of three and more objectives and present some upper and lower bounds.
Testing for Concise Representations
[abstract][pdf] I. Diakonikolas, H. Lee, K. Matulef, K. Onak, R. Rubinfeld, R. Servedio, A. Wan
Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007)
Oded Goldreich's Choices #39
We describe a general method for testing whether a function on $n$
input variables has a concise representation. The approach combines
ideas from the junta test of Fischer et al. \cite{FKR+:04} with ideas
from learning theory, and yields property testers that make
poly$(s/\epsilon)$ queries (independent of $n$) for Boolean function
classes such as $s$-term DNF formulas
(answering a question posed by Parnas et al. \cite{PRS02}),
size-$s$ decision trees, size-$s$ Boolean formulas,
and size-$s$ Boolean circuits.
The method can be applied to non-Boolean valued function classes as
well. This is achieved via a generalization of the notion of
variation from Fischer et al. to non-Boolean functions. Using
this generalization we extend the original junta test of Fischer
et al. to work for non-Boolean functions, and give
poly$(s/\epsilon)$-query testing algorithms for non-Boolean valued
function classes such as size-$s$ algebraic circuits and $s$-sparse
polynomials over finite fields.
We also prove an $\tilde\Omega(\sqrt{s})$ query lower bound for
non-adaptively testing $s$-sparse polynomials over finite fields of
constant size. This shows that in some instances, our general method
yields a property tester with query complexity that is optimal (for nonadaptive
algorithms) up to a polynomial factor.
Small Approximate Pareto Sets for Biobjective Shortest Paths and Other Problems
[abstract][pdf] I. Diakonikolas, M. Yannakakis
SIAM Journal on Computing, 39(4), 1340-1371 (2009)
Proceedings of the 10th Intl. Workshop on Approximation, Randomization, and Combinatorial Optimization (APPROX 2007)
Honorable Mention, George Nicholson student paper competition
INFORMS society, 2009
We investigate the problem of computing a minimum set of solutions that approximates within a specified accuracy $\epsilon$ the
Pareto curve of a multiobjective optimization problem. We show that for a broad class of bi-objective problems (containing
many important widely studied problems such as shortest paths, spanning tree, matching and many others), we can compute in
polynomial time an $\epsilon$-Pareto set that contains at most twice as many solutions as the minimum such set. Furthermore we
show that the factor of $2$ is tight for these problems, i.e., it is NP-hard to do better. We present upper and lower bounds
for three or more objectives, as well as for the dual problem of computing a specified number $k$ of solutions which provide a
good approximation to the Pareto curve.
Thesis
Approximation of Multiobjective Optimization Problems
[abstract][pdf] Ph.D. Thesis, Dept. of Computer Science, Columbia University, May 2011
Awarded with Distinction (for Best Ph.D. thesis in Computer Science)
We study optimization problems with multiple objectives. Such problems are pervasive across many diverse disciplines -- in
economics, engineering, healthcare, biology, to name but a few -- and heuristic approaches to solve them have already been
deployed in several areas, in both academia and industry. Hence, there is a real need for a rigorous investigation of the
relevant questions.
In such problems we are interested not in a single optimal solution, but in the tradeoff between the different objectives. This
is captured by the tradeoff or Pareto curve, the set of all feasible solutions whose vector of the various
objectives is not dominated by any other solution. Typically, we have a small number of objectives and we wish to plot the
tradeoff curve to get a sense of the design space. Unfortunately, typically the tradeoff curve has exponential size for
discrete optimization problems even for two objectives (and is typically infinite for continuous problems). Hence, a natural
goal in this setting is, given an instance of a multiobjective problem, to efficiently obtain a "good" approximation to the
entire solution space with ``few'' solutions. This has been the underlying goal in much of the research in the multiobjective
area, with many heuristics proposed for this purpose, typically however without any performance guarantees or complexity
analysis.
We develop efficient algorithms for the succinct approximation of the Pareto set for a large class of multiobjective problems.
First, we investigate the problem of computing a minimum set of solutions that approximates within a specified accuracy the
Pareto curve of a multiobjective optimization problem. We provide approximation algorithms with tight performance guarantees
for bi-objective problems and make progress for the more challenging case of three and more objectives. Subsequently, we
propose and study the notion of the approximate convex Pareto set; a novel notion of approximation to the Pareto set, as the
appropriate one for the convex setting. We characterize when such an approximation can be efficiently constructed and
investigate the problem of computing minimum size approximate convex Pareto sets, both for discrete and convex problems. Next,
we turn to the problem of approximating the Pareto set as efficiently as possible. To this end, we analyze the Chord algorithm,
a popular, simple method for the succinct approximation of curves, which is widely used, under different names, in a variety of
areas, such as, multiobjective and parametric optimization, computational geometry, and graphics.