MIT Stochastics and Statistics Seminar

Organizers:

Victor Chernozhukov
David Gamarnik
Phillipe Rigollet
Devavrat Shah

For organizational matters, or to be added to the mailing list, please contact Lizzie Raymer.

IDSS

Subscribe to Calendar (iCal)

Maps

Directions

Fall 2015

Dec 11th	Peter Bartlett (UC Berkeley) 32-141, 11-12 Efficient Optimal Strategies for Universal Prediction
Dec 4th	Eric Tchetgen Tchetgen (Harvard University) 32-141, 11-12 Next Generation Missing Data Methodology
Nov 20th	James Robins (Harvard University) 32-141, 11-12 Minimax Estimation of Nonlinear Functionals with Higher Order Influence Functions: Results and Applications
Nov 13th	Jun Liu (Harvard University) 32-141, 11-12 Expansion of biological pathways by integrative Genomics
Nov 6th	Gábor Lugosi (Pompeu Fabra University) 32-141, 11-12 On a High-Dimensional Random Graph Process
Oct 30th	Rina Foygel Barber (University of Chicago) 32-141, 11-12 MOCCA: a primal/dual algorithm for nonconvex composite functions with applications to CT imaging
Oct 23rd	Robert Nowak (University of Wisconsin) 32-141, 11-12 Ranking and Embedding From Pairwise Comparisons
Oct 16th	Stefan Wager (Stanford University) 32-141, 11-12 Causal Inference with Random Forests
Oct 9th	Roman Vershynin (University of Michigan) 32-141, 11-12 Discovering hidden structures in complex networks
Oct 2nd	Constantine Caramanis (University of Texas at Austin) 32-141, 11-12 Fast algorithms and (other) min-max optimal algorithms for mixed regression
Sep 25th	Edo Airoldi (Harvard University) 32-141, 11-12 Some Fundamental Ideas for Causal Inference on Large Networks
Sep 18th	Mustazee Rahman (MIT Mathematics) 32-124, 11-12 Independent sets, local algorithms and random regular graphs
Sep 11th	Rob Freund (MIT Sloan) 32-124, 11-12 An Extended Frank-Wolfe Method with Application to Low-Rank Matrix Completion

Spring 2015

May 8th	Lester Mackey (Stanford) E62-450, 11-12 Measuring Sample Quality with Stein's Method
May 1st	Han Liu (Princeton) E62-450, 11-12 Nonparametric Graph Estimation
Apr 24th	Ankur Moitra (MIT CSAIL) E62-450, 11-12 Tensor Prediction, Rademacher Complexity and Random 3-XOR
Apr 17th	Vianney Perchet (Université Paris Diderot) E62-450, 11-12 From Bandits to Ethical Clinical Trials. Optimal Sample Size for Multi-Phases Problems.
Apr 10th	Moritz Hardt (IBM Almaden) E62-450, 11-12 How good is your model? Guilt-free interactive data analysis.
Feb 27th	Nike Sun (MSR New England and MIT Mathematics) E62-450, 11-12 The exact k-SAT threshold for large k.
Feb 13th	Victor-Emmanuel Brunel (Yale) E17-133, 11-12 Random polytopes and estimation of convex bodies
Feb 6th	Denis Chetverikov (UCLA) E62-450, 11-12 Central Limit Theorems and Bootstrap in High Dimensions

Fall 2014

Dec 12th	Whitney Newey (MIT Economics) E62-587, 11-12 Linear Regression with Many Included Covariates
Dec 5th	Harrison Huibin Zhou (Yale University) E62-587, 11-12 Sparse Canonical Correlation Analysis: Minimaxity and Adaptivity
Nov 21st	Alfred Galichon (Sciences Po, Paris) E62-587, 11-12 Optimal stochastic transport
Nov 7th	Constantinos Daskalakis (MIT EECS) E62-587, 11-12 Beyond Berry Esseen: Structure and Learning of Sums of Random Variables
Oct 31st	Yuan Liao (University of Maryland) E62-587, 11-12 High Dimensional Covariance Matrix Estimations and Factor Models
Oct 24th	Anna Mikusheva (MIT Economics) E62-687, 11-12 A Geometric Approach to Weakly Identified Econometric Models
Oct 10th	Vladimir Koltchinskii (Georgia Tech) E62-650, 11-12 Asymptotics and concentration for sample covariance
Sep 23rd	Richard Nickl (University of Cambridge) E62-587, 12-1 Uncertainty quantification and confidence sets in high-dimensional models
Aug 15th	Florent Krzakala (Université Pierre et Marie) E62-587, 2-3 Superposition codes and approximate-message-passing decoder
Aug 15th	Lenka Zdeborova (CEA) E62-587, 3:15-4:15 Clustering of sparse networks: Phase transitions and optimal algorithms

Spring 2014

May 30th	Nathan Kallus (MIT) E62-587, 11-12 Regression-Robust Designs of Controlled Experiments
May 23rd	Alex Belloni (Duke University) E62-587, 11-12 Uniform Post Selection Inference for Z-estimation problems
May 16th	Antar Bandyopadhyay (University of California, Berkeley) E62-587, 11-12 De-Preferential Attachment Random Graphs
May 2nd	Sahand Negahban (Yale University) E62-587, 2-3 Computationally and Statistically Efficient Estimation in High-Dimensions
Apr 25th	Joel Spencer (Courant Institute, New York University) E62-587, 11-12 Avoiding Outliers
Apr 18th	Sébastien Bubeck (Princeton University) E62-587, 11-12 On the influence of the seed graph in the preferential attachment model
Apr 11th	Alexander Rakhlin (University of Pennsylvania, The Wharton School) E62-587, 11-12 Learning and estimation: separated at birth, reunited at last
Mar 21st	Alexandre Tsybakov (CREST-ENSAE) E62-587, 11-12 Linear and Conic Programming Approaches to High-Dimensional Errors-in-variables Models
Mar 7th	Karthekeyan Chandrasekaran (Harvard) E62-587, 11-12 Integer Feasibility of Random Polytopes
Feb 7th	Michael Brautbar (MIT) E62-587, 11-12 On the Power of Adversarial Infections in Networks

Fall 2013

Dec 13th	Nelly Litvak (University of Twente) E62-587, 11-12 Degree-degree dependencies in random graphs with heavy-tailed degrees
Nov 15th	Ramon van Handel (Princeton University) E62-587, 11-12 Conditional Phenomena in Time and Space
Nov 13th	Lie Wang (MIT) E62-587, 4-5 Multivariate Regression with Calibration
Nov 8th	David Choi (Heinz College, Carnegie Mellon University) E62-587, 11-12 Consistency of Co-clustering exchangeable graph data
Sep 27th	Jim Dai (Cornell University) E62-587, 11-12 Semimartingale reflecting Brownian motions: tail asymptotics for stationary distributions
Sep 20th	Elad Hazan (Technion) Sublinear Optimization

Spring 2013

May 3rd	Rahul Jain (University of Southern California) Transitory Queueing Systems
April 12th	Yashodhan Kanoria (MSR New England and Columbia University) Which side chooses in large random matching markets?
February 8th	Rahul Jain (University of Southern California) The Art of Gambling in a Team: Multi-Player Multi-Armed Bandits

Fall 2012

November 30th	Rahul Mazumder (MIT) Low-rank Matrix Completion: Statistical Models and Large Scale Algorithms
November 16th	Kuang Xu (MIT) Queueing system topologies with limited flexibility
November 2nd	Shankar Bhamidi (University of North Carolina) Cancelled
October 26th	Philippe Rigollet (Princeton University) Optimal detection of a sparse principal component

Spring 2012

May 18th	Alexei Borodin (Massachusetts Institute of Technology) Growth of random surfaces
April 27th	Semen Shlosman (CNRS, France and Institute for Information Transmission Problems, Russia) The Coherence Phase Transition
April 27th	Alexander Rybko (Institute for Information Transmission Problems, Russia) Mean-field Limit for General Queueing Networks on Infinite Graphs
April 20th	Guy Bresler (University of California, Berkeley) Information theory of DNA sequencing
March 26th	Andrei Raigorodksii (Yandex and Moscow Institute of Physics and Technology) Research Groups at Yandex and Moscow Institute of Physics and Technology
March 27th	Andrei Raigorodksii (Yandex and Moscow Institute of Physics and Technology) Web Graph Models and Their Applications
March 28th	Daniil Musatov (Yandex and Moscow Institute of Physics and Technology) Conditional Coding with Limiting Computational Resources
March 29th	Liudmila Ostroumova (Yandex and Moscow Institute of Physics and Technology) An Application of Talagrand's Inequality to Prove a Concentration of Second Degrees in Buckley-Osthus Random Graph Model
March 30th	Dmitry Shabanov (Yandex and Moscow Institute of Physics and Technology) Van der Warden Number and Coloring of Hypergraphs with Large Girth

Fall 2011

Dec 16th	Yuan Zhong (MIT ORC) Delay optimality in switched networks
Oct 21st	Eitan Bachmat (Ben-Gurion University) Does god play dice? An I/O scheduling and airplane boarding perspective.
Oct 7th	Erol Peköz (Boston University) Asymptotics for preferential attachment random graphs via Stein's method

Spring 2009

May 1th	Vivek Goyal (MIT EECS) On Resolution, Sparse Signal Recovery, and Random Access Communication
Apr 17th	Vivek F. Farias (MIT Sloan) The Smoothed Linear Program for Approximate Dynamic Programming
Apr 10th	Scott Sheffield (MIT Math) Fractional simple random walk
Apr 3rd	Mokshay Madiman (Yale University) A New Look at the Compound Poisson Distribution and Compound Poisson Approximation using Entropy
Mar 13rd	Mohsen Bayati (Microsoft Research New England) Sequential algorithms for generating random graphs

Fall 2008

Sep 23rd	Benoît Collins (University of Ottawa) Convergence of unitary matrix integrals

Spring 2008

May 23rd	Johan van Leeuwaarden (Eindhoven University of Technology, EURANDOM, NYU) The Gaussian random walk, sampling Brownian motion, and the Riemann zeta function
Apr 25th	Edward Farhi (MIT Physics) Quantum Computation by Adiabatic Evolution
Apr 18th	Ton Dieker (Georgia Tech I&SE) Large deviations for random walks under subexponentiality: the big-jump domain
Apr 10th	Peter Glynn (Stanford MS&E) Bounds on Stationary Expectations for Markov Processes
Mar 21st	Daron Acemoglu (MIT Econ) Fragility of Asymptotic Agreement under Bayesian Learning
Feb 29th	Victor Chernozhukov (MIT Econ & ORC) Quantile and Probability Curves without Crossing

Fall 2007

Nov 16th	David Forney (MIT LIDS) Exponential Error Bounds for Random Codes on the BSC

Abstracts

Efficient Optimal Strategies for Universal Prediction

Peter Bartlett (UC Berkeley)

In game-theoretic formulations of prediction problems, a strategy makes a decision, observes an outcome and pays a loss. The aim is to minimize the regret, which is the amount by which the total loss incurred exceeds the total loss of the best decision in hindsight. This talk will focus on the minimax optimal strategy, which minimizes the regret, in three settings: prediction with log loss (a formulation of sequential probability density estimation that is closely related to sequential compression, coding, gambling and investment problems), sequential least squares (where decisions and outcomes lie in a subset of a Hilbert space, and loss is squared distance), and linear regression (where the aim is to predict real-valued labels as well as the best linear function). We obtain the minimax optimal strategy for these problems, and show that it can be efficiently computed.
top of page

Next Generation Missing Data Methodology

Eric Tchetgen Tchetgen (Harvard University)

Missing data is a reality of empirical sciences and can rarely be prevented entirely. It is often assumed that incomplete data are missing completely at random (MCAR) or missing at random (MAR), When neither MCAR nor MAR, missingness is said to be Not MAR (NMAR). Under MAR, there are two main approaches to inference, likelihood/Bayesian inference, e.g. EM or MI, and semiparametric approaches such as Inverse probability weighting (IPW). In several important settings, likelihood based inferences suffer the difficulty of relying on modeling assumptions that may conflict with the model of substantive interest, while semiparametric methods clearly avoid this problem by relying on a model of the nonresponse process. In the common setting where missingness patterns are nonmonotone, it is difficult to model the nonresponse process without imposing an assumption stronger than MAR. This gap in the literature is resolved and a novel approach is proposed for modeling a nonmonote nonresponse process under MAR for use with IPW. Despite this solution, it is however argued that MAR is hard to justify on causal grounds when missingness is nonmonote, and therefore, new methods are considered for identification and inference when nonmonotone missing data are NMAR. Throughout, it is illustrated that the methods using both simulations and empirical illustrations in HIV research.
top of page

Minimax Estimation of Nonlinear Functionals with Higher Order Influence Functions: Results and Applications

James Robins (Harvard University)

Professor Robins describes a novel approach to point and interval estimation of nonlinear functionals in parametric, semi-, and non-parametric models based on higher order influence functions. Higher order influence functions are higher order U-statistics. The approach applies equally to both $\sqrt{n}$ and non-$\sqrt{n}$ problems. It reproduces results previously obtained by the modern theory of non-parametric inference, produces many new $\sqrt{n}$ and non-$\sqrt{n}$ results, and opens up the ability to perform non-$\sqrt{n}$ inference in complex high dimensional models, such as models for the average causal effects of not only time independent treatments but also of time varying treatments in the presence of time varying confounding variables and informative censoring. The estimators are always rate minimax in $\sqrt{n}$ problems and often in non-$\sqrt{n}$ problems as well. The approach can also be used to derive novel nonparametric tests of unconditional and conditional independence with good power against complex alternatives.
top of page

Expansion of biological pathways by integrative Genomics

Jun Liu (Harvard University)

The number of publicly available gene expression datasets has been growing dramatically. Various methods had been proposed to predict gene co-expression by integrating the publicly available datasets. These methods assume that the genes in the query gene set are homogeneously correlated and consider no gene-specific correlation tendencies, no background intra-experimental correlations, and no quality variations of different experiments. We propose a two-step algorithm called CLIC (CLustering by Inferred Co-expression) based on a coherent Bayesian model to overcome these limitations. CLIC first employs a Bayesian partition model with feature selection to partition the gene set into disjoint co-expression modules (CEMs), simultaneously assigning posterior probability of selection to each dataset. In the second step, CLIC expands each CEM by scanning the whole reference genome for candidate genes that were not in the input gene set but co-expressed with the genes in this CEM. CLIC is capable of integrating over thousands of gene expression datasets to achieve much higher co-expression prediction accuracy compared to traditional co-expression methods. Application of CLIC to ~1000 annotated human pathways and ~6000 poorly characterized human genes reveals new components of some well-studied pathways and provides strong functional predictions for some poorly characterized genes. We validated the predicted association between protein C7orf55 and ATP synthase assembly using CRISPR knock-out assays.

Based on the joint work with Yang Li and the Vamsi Mootha lab.
top of page

On a High-Dimensional Random Graph Process

Gábor Lugosi (Pompeu Fabra University)

We introduce a model for a high-dimensional random graph process and ask how "rich" the process has to be so that one finds atypical behavior. In particular, we study a natural process of Erdös-Rényi random graphs indexed by unit vectors in R^d . We investigate the deviations of the process with respect to three fundamental properties: clique number, chromatic number, and connectivity. The talk is based on joint work with Louigi Addario-Berry, Shankar Bhamidi, Sebastien Bubeck, Luc Devroye, and Roberto Imbuzeiro Oliveira.
top of page

MOCCA: a primal/dual algorithm for nonconvex composite functions with applications to CT imaging

Rina Foygel Barber (University of Chicago)

Many optimization problems arising in high-dimensional statistics decompose naturally into a sum of several terms, where the individual terms are relatively simple but the composite objective function can only be optimized with iterative algorithms. Specifically, we are interested in optimization problems of the form F(Kx) + G(x), where K is a fixed linear transformation, while F and G are functions that may be nonconvex and/or nondifferentiable. In particular, if either of the terms are nonconvex, existing alternating minimization techniques may fail to converge; other types of existing approaches may instead be unable to handle nondifferentiability. We propose the MOCCA (mirrored convex/concave) algorithm, a primal/dual optimization approach that takes local convex approximation to each term at every iteration. Inspired by optimization problems arising in computed tomography (CT) imaging, this algorithm can handle a range of nonconvex composite optimization problems, and offers theoretical guarantees for convergence when the overall problem is approximately convex (that is, any concavity in one term is balanced out by convexity in the other term). Empirical results show fast convergence for several structured signal recovery problems.
top of page

Ranking and Embedding From Pairwise Comparisons

Robert Nowak (University of Wisconsin)

Ranking, clustering, or metrically-embedding a set of items (e.g., images, documents, products) based on human judgments can shed light on preferences and human reasoning. Two common approaches to collecting data from people are rating and comparison-based systems. Ratings can be difficult to calibrate across people. Also, in certain applications, it may be far easier to compare items than to rate them (e.g., rating funniness of jokes is more difficult than deciding which of two jokes is more funny). For these reasons, pairwise comparisons are often used in practice. This talk focuses on ranking and metric embedding from pairwise comparisons, and theory and methods for adaptive data collection in particular. Adaptive data collection can reduce the number of comparisons required to learn an accurate ranking or embedding, but is challenging to mathematically analyze. The gap between theory and practice is relatively small in the case of ranking, but many difficult mathematical questions remain for embedding from pairwise comparisons. The talk will also illustrate progress and challenges through several ranking and embedding experiments carried out with a new open-source software system called NEXT.
top of page

Causal Inference with Random Forests

Stefan Wager (Stanford University)

Many scientific and engineering challenges---ranging from personalized medicine to customized marketing recommendations---require an understanding of treatment heterogeneity. We develop a non-parametric causal forest for estimating heterogeneous treatment effects that is closely inspired by Breiman's widely used random forest algorithm. Given a potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect, and have an asymptotically Gaussian and centered sampling distribution. We also propose a practical estimator for the asymptotic variance of causal forests. In both simulations and an empirical application, we find causal forests to be substantially more powerful than classical methods based on nearest-neighbor matching, especially as the number of covariates increases. Our theoretical results rely on a generic asymptotic normality theory for a large family of random forest algorithms. To our knowledge, this is the first set of results that allows random forests, including classification and regression forests, to be used for valid statistical inference.

This talk is based on joint work with Susan Athey, Bradley Efron, and Trevor Hastie.
top of page

Discovering hidden structures in complex networks

Roman Vershynin (University of Michigan)

Most big real-world networks (social, technological, biological) are sparse. Most of networks have noticeable structure, which can be formed by clusters (communities) and hubs. When and how can a hidden structure be recovered from a sparse network? Known approaches to this problem come from a variety of disciplines – probability, combinatorics, physics, statistics, optmization, information theory, etc. We will focus on the recently developed probabilistic approaches motivated by sparse recovery, where a network is regarded as a random measurement of the hidden structure.
top of page

Fast algorithms and (other) min-max optimal algorithms for mixed regression

Constantine Caramanis (University of Texas at Austin)

Mixture models represent the superposition of statistical processes, and are natural in machine learning and statistics. In mixed regression, the relationship between input and output is given by one of possibly several different (noisy) linear functions. Thus the solution encodes a combinatorial selection problem, and hence computing it is difficult in the worst case. Even in the average case, little is known in the realm of efficient algorithms with strong statistical guarantees.

We give general conditions for linear convergence of an EM-like (and hence fast) algorithm for latent-variable problems in high dimensions, and show this implies that for sparse (or low-rank) mixed regression, EM converges linearly, in a neighborhood of the optimal solution, in the high-SNR regime. For the low-SNR regime, we show that a new behavior emerges. Here, we give a convex optimization formulation that provably recovers the true solution, and we provide upper bounds on the recovery errors for both arbitrary noise and stochastic noise settings. We also give matching minimax lower bounds, showing that our algorithm is information-theoretically optimal.

Our results represent what is, as far as we know, the only tractable algorithm guaranteeing successful recovery with tight bounds on recovery errors and sample complexity.
top of page

Some Fundamental Ideas for Causal Inference on Large Networks

Edo Airoldi (Harvard University)

Classical approaches to causal inference largely rely on the assumption of “lack of interference”, according to which the outcome of each individual does not depend on the treatment assigned to others. In many applications, however, including healthcare interventions in schools, online education, and design of online auctions and political campaigns on social media, assuming lack of interference is untenable. In this talk, Prof. Airoldi will introduce some fundamental ideas to deal with interference in causal analyses, focusing on situations where interference can be attributed to a network among the units of analysis, and offer new results and insights for estimating causal effects in this context.
top of page

Independent sets, local algorithms and random regular graphs

Mustazee Rahman (MIT Mathematics)

A independent set in a graph is a set of vertices that have no edges between them. How large can a independent set be in a random d-regular graph? How large can it be if we are to construct it using a (possibly randomized) algorithm that is local in nature? We will discuss a notion of local algorithms for combinatorial optimization problems on large, random d-regular graphs. We will then explain why, for asymptotically large d, local algorithms can only produce independent sets of size at most half of the largest ones. The factor of 1/2 turns out to be optimal. Joint work with Balint Virag.
top of page

An Extended Frank-Wolfe Method with Application to Low-Rank Matrix Completion

Rob Freund (MIT Sloan)

We present an extension of the Frank-Wolfe method that is designed to induce near-optimal solutions on low-dimensional faces of the feasible region. We present computational guarantees for the method that trade off efficiency in computing near-optimal solutions with upper bounds on the dimension of minimal faces of iterates. We apply our method to the low-rank matrix completion problem, where low-dimensional faces correspond to low-rank solutions. We present computational results for large-scale low-rank matrix completion problems that demonstrate significant speed-ups in computing low-rank near-optimal solutions on both artificial and real datasets. This is joint work with Paul Grigas (ORC) and Rahul Mazumder (Sloan).
top of page

Measuring Sample Quality with Stein's Method

Lester Mackey (Stanford)

To carry out posterior inference on datasets of unprecedented sizes, practitioners are turning to biased MCMC procedures that trade off asymptotic exactness for computational efficiency. The reasoning is sound: a reduction in variance due to more rapid sampling can outweigh the bias introduced. However, the inexactness creates new challenges for sampler and parameter selection, since standard measures of sample quality like effective sample size do not account for asymptotic bias. To address these challenges, we introduce a new computable quality measure based on Stein's method that bounds the discrepancy between sample and target expectations over a large class of test functions. We use our tool to compare exact, asymptotically biased, and deterministic sample sequences and illustrate applications to hyperparameter selection, convergence rate assessment, and quantifying bias-variance tradeoffs in posterior inference.
top of page

Nonparametric Graph Estimation

Han Liu (Princeton)

Undirected graphical model has proven to be a useful abstraction in statistics and machine learning. The starting point is the graph of a distribution. While often the graph is assumed given, we are interested in estimating the graph from data. In this talk we present new nonparametric and semiparametric methods for graph estimation. The performance of these methods is illustrated and compared on several real and simulated examples.
top of page

Tensor Prediction, Rademacher Complexity and Random 3-XOR

Ankur Moitra (MIT CSAIL)

Here we study the tensor prediction problem, where the goal is to accurately predict the entries of a low rank, third-order tensor (with noise) given as few observations as possible. We give algorithms based on the sixth level of the sum-of-squares hierarchy that work with roughly $m= n^{3/2}$ observations, and we complement our result by showing that any attempt to solve tensor prediction with fewer observations through the sum-of-squares hierarchy would run in moderately exponential time. In contrast, information theoretically roughly $m = n$ observations suffice.

This work is part of a broader agenda of studying computational vs. statistical tradeoffs through the sum-of-squares hierarchy. In particular, for linear inverse problems (such as tensor prediction) the natural sum-of-squares relaxation gives rise to a sequence of norms. Our approach is to characterize their Rademacher complexity. Moreover, both our upper and lower bounds are based on connections between this, and the task of strongly refuting random 3-XOR formulas, and the resolution proof system.

This talk is based on joint work with Boaz Barak
top of page

From Bandits to Ethical Clinical Trials. Optimal Sample Size for Multi-Phases Problems

Vianney Perchet (Université Paris Diderot)

In the first part of this talk, I will present recent results on the problem of sequential allocations called “multi-armed bandit”. Given several i.i.d. processes, the objective is to sample them sequentially (and thus get a sequence of random rewards) in order to maximize the expected cumulative reward. This framework simultaneously encompasses issues of estimation and optimization (the so-called “exploration vs exploitation” dilemma). A recent example of applications is the ad placement on web sites.

In the second part, I will come back to the original motivation of bandit problems: clinical trials. They usually consist of a small number of phases (typically three or four). The first phase is a pilot study and treatment can be allocated at random. In the following phases, treatments are re-allocated depending on the result of the pilot study (and the subsequent phases). We will show how to theoretically choose the sizes of these phases and we shall look whether having more phases leads to significant improvements.
top of page

How good is your model? Guilt-free interactive data analysis.

Moritz Hardt (IBM Almaden)

Reliable tools for model selection and validation are indispensable in almost all applications of machine learning and statistics. Decades of theory support a widely used set of techniques, such as holdout sets, bootstrapping and cross validation methods. Yet, much of the theory breaks down in the now common situation where the data analyst works interactively with the data, iteratively choosing which methods to use by probing the same data many times. A good example are data science competitions in which hundreds of analysts repeatedly score their models on the same holdout set with the danger of overfitting to the holdout itself.

In this talk, we first discuss that, in general, it is computationally hard to prevent overfitting in interactive data analysis. Achieving what is possible in light of the hardness result, we will then see a powerful reusable holdout method that can be used many times without losing the guarantees of fresh data. We conclude with a simple and practical algorithm for maintaining a provably accurate leaderboard in machine learning competitions and demonstrate its practical utility through experiments on real submission files from a Kaggle competition.

Based on joint works with Avrim Blum, Cynthia Dwork, Vitaly Feldman, Toni Pitassi, Aaron Roth, Omer Reingold and Jon Ullman.
top of page

The exact $k$-SAT threshold for large $k$

Nike Sun (MSR New England and MIT Mathematics)

We establish the random $k$-SAT threshold conjecture for all $k$ exceeding an absolute constant $k_0$. That is, there is a single critical value $\alpha_*(k)$ such that a random $k$-SAT formula at clause-to-variable ratio $\alpha$ is with high probability satisfiable for $\alpha<\alpha_*(k)$, and unsatisfiable for $\alpha > \alpha_*(k)$. The threshold $\alpha_*(k)$ matches the explicit prediction derived by statistical physicists on the basis of the one-step replica symmetry breaking (1RSB) heuristic. In the talk I will describe the main obstacles in computing the threshold, and explain how they are overcome in our proof.

Joint work with Jian Ding and Allan Sly.
top of page

Random polytopes and estimation of convex bodies

Victor-Emmanuel Brunel (Yale)

In this talk we discuss properties of random polytopes. In particular, we study the convex hull of i.i.d. random points, whose law is supported on a convex body. We propose deviation and moment inequalities for this random polytope, and then discuss its optimality, when it is seen as an estimator of the support of the probability measure, which may be unknown.

We also define a notion of multidimensional quantile sets for probability measures in a Euclidean space. These are convex sets, which are related to the notions of floating bodies, or the Tukey depth. When i.i.d. random points are available, these multidimensional quantile sets can be estimated by their empirical versions, and we again propose deviation and moment inequalities for the empirical quantile sets.
top of page

Central Limit Theorems and Bootstrap in High Dimensions

Denis Chetverikov (UCLA)

We derive central limit and bootstrap theorems for probabilities that centered high-dimensional vector sums hit rectangles and sparsely convex sets. Specifically, we derive Gaussian and bootstrap approximations for the probabilities $\Pr(n^{-1/2}\sum_{i=1}^n X_i\in A)$ where $X_1,\dots,X_n$ are independent random vectors in $\mathbb{R}^p$ and $A$ is a rectangle, or, more generally, a sparsely convex set, and show that the approximation error converges to zero even if $p=p_n\to \infty$ and $p\gg n$; in particular, $p$ can be as large as $O(e^{Cn^c})$ for some constants $c,C>0$. The result holds uniformly over all rectangles, or more generally, sparsely convex sets, and does not require any restrictions on the correlation among components of $X_i$. Sparsely convex sets are sets that can be represented as intersections of many convex sets whose indicator functions depend nontrivially only on a small subset of their arguments, with rectangles being a special case. All of the bounds on approximation errors are non-asymptotic. The proofs rely on an effective use of Slepian-Stein methods and some new Gaussian comparison inequalities.

These results have already found many useful applications in modern high-dimensional statistics and econometrics.

(This talk is based on joint work with V. Chernozhukov and K. Kato.)

References: Arxiv; Arxiv; Tutorial by Larry Wasserman
top of page

Linear Regression with Many Included Covariates

Whitney Newey (MIT Economics)

We consider asymptotic inference for linear regression coefficients when the number of included covariates grows as fast as the sample size. We find a limiting normal distribution with asymptotic variance that is larger than the usual one. We also find that all of the usual versions of heteroskedasticity consistent standard error estimators are inconsistent under this asymptotics. The problem with these standard errors is that they do not make a correct "degrees of freedom" adjustment. We propose a new heteroskedasticity consistent standard error formula that is fully automatic and robust to both (conditional) heteroskedasticity of unknown form and the inclusion of many covariates. We illustrate our findings in three distinct settings: (i) parametric linear models with many covariates, (ii) semiparametric semilinear models with many technical regressors, and (iii) linear panel models with many fixed effects.
top of page

Sparse Canonical Correlation Analysis: Minimaxity and Adaptivity

Harrison Huibin Zhou (Yale University)

Canonical correlation analysis is a widely used multivariate statistical technique for exploring the relation between two sets of variables. In this talk we consider the problem of estimating the leading canonical correlation directions in high dimensional settings. Recently, under the assumption that the leading canonical correlation directions are sparse, various procedures have been proposed for many high dimensional applications involving massive data sets. However, there has been few theoretical justification available in the literature. In this talk, we establish rate-optimal non-asymptotic minimax estimation with respect to an appropriate loss function for a wide range of model spaces. Two interesting phenomena are observed. First, the minimax rates are not affected by the presence of nuisance parameters, namely the covariance matrices of the two sets of random variables, though they need to be estimated in the canonical correlation analysis problem. Second, we allow the presence of the residual canonical correlation directions. However, they do not influence the minimax rates under a mild condition on eigengap. A generalized sin-theta theorem and an empirical process bound for Gaussian quadratic forms under rank constraint are used to establish the minimax upper bounds, which may be of independent interest.

If time permits, we will discuss a computationally efficient two-stage estimation procedure which consists of a convex programming based initialization stage and a group Lasso based refinement stage, and show some encouraging numerical results on simulated data sets and a breast cancer data set.
top of page

Optimal stochastic transport

Alfred Galichon (Sciences Po, Paris)

We explore the link between the Monge-Kantorovich problem and the Skorohod embedding problem. This question arises in particular in Mathematical Finance when seeking model-free bounds on some option prices when the marginal distributions of the underlying at various maturities are implied by European options prices. We provide a stochastic control approach which we connect to several important constructions. Finally we revisit in this light the celebrated Azéma-Yor solution of the Skorohod embedding problem.

This talk is based on joint works with Guillaume Carlier, Pierre Henry-Labordère and Nizar Touzi.
top of page

Beyond Berry Esseen: Structure and Learning of Sums of Random Variables

Constantinos Daskalakis (MIT EECS)

The celebrated Berry-Esseen theorem, and its variants, provide a useful approximation to the sum of independent random variables by a Gaussian. In this talk, I will restrict attention to the important case of sums of integer random variables, arguing that Berry-Esseen theorems fall short from characterizing their general structure. I will offer stronger finitary central limit theorems, tightly characterizing the structure of these distributions, and show their implications to learning. In particular, I will present algorithms that can learn sums of integer random variables from a number of samples that is independent of how many variables are summed over.
top of page

High Dimensional Covariance Matrix Estimations and Factor Models

Yuan Liao (University of Maryland)

Large covariance matrix estimation is crucial for high-dimensional statistical inferences, and has also played an central role in factor analysis. Applications are found in analyzing financial risks, climate data, genomic data and PCA, etc. Commonly used approaches to estimating large covariances include shrinkages and sparse modeling. This talk will present new theoretical results on estimating large (inverse) covariance matrices under large N large T asymptotics, with a focus on the roles it plays in statistical inferences for large panel data and factor models. The application examples will include efficient estimation for factor analysis and portfolio allocations under large assets.
top of page

A Geometric Approach to Weakly Identified Econometric Models

Anna Mikusheva (MIT Economics)

Many nonlinear Econometric models show evidence of weak identification. In this paper we consider minimum distance statistics and show that in a broad class of models the problem of testing under weak identification is closely related to the problem of testing a curved null in a finite-sample Gaussian model. Using the curvature of the model, we develop new finite-sample bounds on the distribution of minimum-distance statistics, which we show can be used to detect weak identification and to construct tests robust to weak identification. We apply our new method to new Keynesian Phillips curve and DSGE examples and show that it provides a significant improvement over existing approaches.
top of page

Asymptotics and concentration for sample covariance

Vladimir Koltchinskii (Georgia Tech)

We will discuss recent moment bounds and concentration inequalities for sample covariance operators based on a sample of n i.i.d. Gaussian random variables taking values in an infinite dimensional space. These bounds show that the size of the operator norm of the deviation of sample covariance from the true covariance can be completely characterized by two parameters: the operator norm of the true covariance and its so called "effective rank". These results rely on Talagrand's generic chaining bounds and on Gaussian concentration. We then discuss several asymptotic and concentration results for spectral projectors of sample covariance in the case when the "effective rank" is large, but it is smaller than the sample size. These results include, in particular, asymptotic normality of bilinear forms of empirical spectral projectors and of squared Hilbert--Schmidt norms of their deviations from the spectral projectors of the true covariance.

Most of the results are joint with Karim Lounici
top of page

Uncertainty quantification and confidence sets in high-dimensional models

Richard Nickl (University of Cambridge)

While much attention has been paid recently to the construction of optimal algorithms that adaptively estimate low-dimensional parameters (described by sparsity, low-rank, or smoothness) in high-dimensional models, the theory of statistical inference and uncertainty quantification (in particular hypothesis tests & confidence sets) is much less well-developed. We will discuss some perhaps surprising impossibility results in the basic high-dimensional compressed sensing model, and some of the recently remerging positive results in the area.
top of page

Superposition codes and approximate-message-passing decoder

Florent Krzakala (Université Pierre et Marie)

Superposition codes are asymptotically capacity achieving scheme for the Additive White Gaussian Noise channel. I will first show how a practical iterative decoder can be built based on a Belief Propagation type approach, closely related to the one performed in compressed sensing and sparse estimation problems. Secondly, I will show how the idea of spatial coupling in this context allows to built efficient and practical capacity achieving coding and decoding schemes. The links between the present problem, sparse estimations, and non-parametric belief propagation for continuous variables will be also discussed.

References:
Replica Analysis and Approximate Message Passing Decoder for Superposition Codes, J. Barbier and F. Krzakala, ISIT 2014.
Statistical physics-based reconstruction in compressed sensing, F. Krzakala et al. Phys. Rev. X 2, 021005 (2012)
top of page

Clustering of sparse networks: Phase transitions and optimal algorithms

Lenka Zdeborova (CEA)

A central problem in analyzing networks is partitioning them into modules or communities, clusters with a statistically homogeneous pattern of links to each other or to the rest of the network. A principled approach to address this problem is to fit the network on a stochastic block model, this task is, however, intractable exactly. In this talk we discuss application of belief propagation algorithm to module detection. In the first part we present an asymptotically exact analysis of the stochastic block model. We quantify properties of the detectability/undetectability phase transition and the easy/hard phase transition for the module detection problem. In the second part we introduce a new class of spectral algorithms based on a non-backtracking walk on the directed edges of the graph. We show that our algorithm is optimal for graphs generated by the stochastic block model, detecting communities all the way down to the theoretical limit. We also show the spectrum of the non-backtracking operator for some real-world networks, illustrating its advantages over traditional spectral clustering.

References:

Phase transition in the detection of modules in sparse networks;
Aurelien Decelle, Florent Krzakala, Cristopher Moore, Lenka Zdeborová; Phys. Rev. Lett. 107, 065701 (2011).

Spectral redemption: clustering sparse networks;
Florent Krzakala, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborová, Pan Zhang; Proceedings of the National Academy of Sciences 110, no. 52 (2013): 20935-20940
top of page

Regression-Robust Designs of Controlled Experiments

Nathan Kallus (MIT)

Achieving balance between experimental groups is a cornerstone of causal inference. Without balance any observed difference may be attributed to a difference other than the treatment alone. In controlled/clinical trials, where the experimenter controls the administration of treatment, complete randomization of subjects has been the golden standard for achieving this balance because it allows for unbiased and consistent estimation and inference in the absence of any a priori knowledge or measurements. However, since estimator variance under complete randomization may be slow to converge, experimental designs that balance pre-treatment measurements (baseline covariates) are in pervasive use, including randomized block designs, pairwise-matched designs, and re-randomization. We formally argue that absolutely no balance better than complete randomization's can be achieved without partial structural knowledge about the treatment effects. Therefore, that balancing designs are in popular use and are advocated means that partial knowledge is in fact available to the researcher, just as one would expect from a researcher conducting an experiment in her own domain of expertise. We propose a novel framework for formulating such knowledge using functional analysis that subsumes all of the aforementioned designs in that it recovers them as optimal under certain modeling choices, thus theoretically characterizing their comparative power under different assumptions and suggesting extensions of these to multi-arm trials. Furthermore, it suggests new optimal designs that are based on more robust nonparametric modeling and that offer extensive gains in precision and power. In certain cases we are able to argue linear convergence O(1/2^n) to the sample average treatment effect (as compared to the usual logarithmic convergence O(1/n)). We characterize generally the unbiasedness, variance, and consistency of resulting estimators; solve the design problem; and develop appropriate inferential algorithms for Fisherian and Neymanian hypotheses. We uncover connections to Bayesian experimental design and make extensions to dealing with non-compliance.
top of page

Uniform Post Selection Inference for Z-estimation problems

Alex Belloni (Duke University)

In this talk we will consider inference with high dimensional data. We propose new methods for estimating and constructing confidence regions for a regression parameter of primary interest alpha_0, a parameter in front of the regressor of interest, such as the treatment variable or a policy variable. We show how to apply these methods to Z-estimators (for example, logistic regression and quantile regression). These methods allow to estimate alpha_0 at the root-n rate when the total number p of other regressors, called controls, exceed the sample size n, using the sparsity assumptions. The sparsity assumption means that only s unknown controls are needed to accurately approximate the nuisance part of the regression function, where s is smaller than n. Importantly, the estimators and these resulting confidence regions are ``honest" in the formal sense that their properties hold uniformly over s-sparse models. Moreover, these procedures do not rely on traditional ``consistent model selection" arguments for their validity; in fact, they are robust with respect to ``moderate" model selection mistakes in variable selection steps. Furthermore, the estimators are semi-parametrically efficient in the sense of attaining the semi-parametric efficiency bounds for the class of models in this paper.
top of page

De-Preferential Attachment Random Graphs

Antar Bandyopadhyay (University of California, Berkeley)

In this talk we will introduce a new model of a growing sequence of random graphs where a new vertex is less likely to join to an existing vertex with high degree and more likely to join to a vertex with low degree. In contrast to the well studied model of preferential attachment random graphs where higher degree vertices are preferred, we will call our model de-preferential attachment random graph model. We will consider two types of de-preferential attachment models, namely, inverse de-preferential, where the attachment probabilities are inversely proportional to the degree and linear de-preferential, where the attachment probabilities are proportional to $c$-degree, where $c > 0$ is a constant. We will give asymptotic degree distribution for both the models and show that the limiting degree distribution has very thin tail. We will also show that for a fixed vertex $v$, the degree grows as $\sqrt{\log n}$ for the inverse de-preferential case and as $\log n$ for the linear case. Some of the results will also be generalized when each new vertex joins to $m > 1$ existing vertices.

This is a joint work with Subhabrata Sen, Stanford University.
top of page

Computationally and Statistically Efficient Estimation in High-Dimensions

Sahand Negahban (Yale University)

Modern techniques in data accumulation and sensing have led to an explosion in both the volume and variety of data. Many of the resulting estimation problems are high-dimensional, meaning that the number of parameters to estimate can be far greater than the number of examples. A major focus of my work has been developing an understanding of how hidden low-complexity structure in large datasets can be used to develop computationally efficient estimation methods. I will discuss a framework for establishing the error behavior of a broad class of estimators under high-dimensional scaling. I will then discuss how to compute these estimates and draw connections between the statistical and computational properties of our methods. Interestingly, the same tools used to establish good high-dimensional estimation performance have a direct impact for optimization: better conditioned statistical problems lead to more efficient computational methods.
top of page

Avoiding Outliers

Joel Spencer (Courant Institute, New York University)

Given $n$ vectors $r_j$ in $n$-space with all coefficients in $[-1,+1]$ one wants a vector $x=(x_1,...,x_n)$ with all $x_i=+1$ or $-1$ so that all dot products $x\cdot r_j$ are at most $K\sqrt{n}$ in absolute value, $K$ an absolute constant. A random $x$ would make $x\cdot r_j$ roughly Gaussian but there would be outliers. The existence of such an $x$ was first shown by the speaker, resolving a discrepancy question of Paul Erdős. However, the original argument did not give an effective algorithm. The first algorithm was found by Nikhil Bansal. The approach here, due to Lovett and Meka, is to begin with $x=(0,...,0)$ and let it float in a kind of restricted Brownian Motion until all the coordinates hit the boundary. There is some nice probability using multidimensional Gaussians, together with a martingale result on a “Gaussian Casino.”

Paul Erdős was an extraordinary twentieth century mathematician with a unique life style and the speaker will include personal reminiscences of him.
top of page

On the influence of the seed graph in the preferential attachment model

Sébastien Bubeck (Princeton University)

We are interested in the following question: suppose we generate a large graph according to the linear preferential attachment model---can we say anything about the initial (seed) graph? A precise answer to this question could lead to new insights for the diverse applications of the preferential attachment model. In this work we focus on the case of trees grown according to the preferential attachment model. We first show that the seed has no effect from a weak local limit point of view. On the other hand, we conjecture that different seeds lead to different distributions of limiting trees from a total variation point of view. We take some steps in proving this conjecture by focusing on star seeds.
top of page

Learning and estimation: separated at birth, reunited at last

Alexander Rakhlin (University of Pennsylvania, The Wharton School)

Abstract: We consider the problem of regression in three scenarios: (a) random design under the assumption that the model F is correctly specified, (b) distribution-free statistical learning with respect to a reference class F; and (c) online regression with no assumption on the generative process. The first problem is often studied in the literature on nonparametric estimation, the second falls within the purview of statistical learning theory, and the third is studied within the online learning community. It is recognized that complexity of the class F plays the key role in determining the minimax behavior: the importance of entropy in the study of estimation goes back to Le Cam, Ibragimov and Khas'minskii, and Birge; within the setting of statistical learning the importance of entropy was established in the work of Vapnik and Chervonenkis and in subsequent works on uniform LLN within empirical process theory. The corresponding complexities for online learning have only been found in the past few years. But do these three problems really differ from the minimax point of view? The question, which boils down to understanding well-specified and misspecified models, will be addressed in this talk.

Joint work with Karthik Sridharan and Sasha Tsybakov
top of page

Linear and Conic Programming Approaches to High-Dimensional Errors-in-variables Models

Alexandre Tsybakov (CREST-ENSAE)

We consider the regression model with observation error in the design when the dimension can be much larger than the sample size and the true parameter is sparse. We propose two new estimators, based on linear and conic programming, and we prove that they satisfy oracle inequalities similar to those for the model with exactly known covariates. The only difference is that they contain additional scaling with the $l_1$ or $l_2$ norm of the true parameter. The scaling with the $l_2$ norm is minimax optimal and it is achieved on conic programming, while the scaling with the $l_1$ norm is achieved on the linear programming estimator, which is easier to implement.

This is a joint work with Mathieu Rosenbaum.
top of page

Integer Feasibility of Random Polytopes

Karthekeyan Chandrasekaran (Harvard)

Optimization problems are ubiquitous in contemporary engineering. A principal barrier to solving several real-world optimization problems is input uncertainty. In this talk, I will present new tools to study probabilistic instances of integer programs. As an application, I will show a phase-transition phenomenon in a simple distribution model for random integer programs. Our main tool is an elementary connection between integer programming and matrix discrepancy. I will describe this connection and derive matching upper and lower bounds on the discrepancy of random Gaussian matrices.

Based on joint work with Santosh Vempala.
top of page

On the Power of Adversarial Infections in Networks

Michael Brautbar (MIT)

Over the last decade we have witnessed the rapid proliferation of online networks and Internet activity. Such activity is considered as a blessing but it brings with it a large increase in risk of computer malware --- malignant software that actively spreads from one computer to another. To date, the majority of existing models of malware spread use stochastic behavior, where the set of neighbors infected from the current set of infected nodes is chosen obliviously. In this work, we initiate the study of adversarial infection strategies which can decide intelligently which neighbors of infected nodes to infect next in order to maximize their spread, while maintaining a similar ``signature'' as the oblivious stochastic infection strategy as not to be discovered. We first establish that computing an optimal and near-optimal adversarial strategies is computational hard. We then identify necessary and sufficient conditions in terms of network structure and edge infection probabilities such that the adversarial process can infect polynomially more nodes than the stochastic process while maintaining a similar ``signature'' as the oblivious stochastic infection strategy. Among our results is a surprising connection between an additional structural quantity of interest in a network, the network toughness, and adversarial infections. Based on the network toughness, we characterize networks where existence of adversarial strategies that are pandemic (infect all nodes) is guaranteed, as well as efficiently computable.

Based on joint work with Moez Draief and Sanjeev Khanna.
top of page

Degree-degree dependencies in random graphs with heavy-tailed degrees

Nelly Litvak (University of Twente)

Mixing patterns in large self-organizing networks, such as the Internet, the World Wide Web, social and biological networks are often characterized by degree-degree dependencies between neighboring nodes. In assortative networks the degree-degree dependencies are positive (nodes with similar degrees tend to connect to each other), while in disassortative networks these dependencies are negative. One of the problems with the commonly used Pearson correlation coefficient, also known as the assortativity coefficient, is that its magnitude decreases with the network size in disassortative networks. This makes it impossible to compare mixing patterns, for example, in two web crawls of different sizes. In this talk we analytically investigate degree-degree dependencies for scale-free graph sequences. In particular, we show that the Pearson's correlation coefficient is non-negative in the large graph limit when the asymptotic degree distribution has an infinite third moment. Furthermore, we provide examples where the Pearson's correlation coefficient converges to zero in a network with strong negative degree-degree dependencies, and another example where this coefficient converges in distribution to a random variable. We suggest the alternative degree-degree dependency measure, based on Spearman's rho, and prove that this statistical estimator converges to an appropriate limit under quite general conditions. These conditions are proved to hold in common network models, such as the configuration model and the preferential attachment model.
top of page

Conditional Phenomena in Time and Space

Ramon van Handel (Princeton University)

Random phenomena are ubiquitous throughout science and engineering. Beyond the study of stochastic models in their own right, it is of importance in many applications to understand what information can be extracted on the basis of observed data. Mathematically, such ``data assimilation'' problems motivate the investigation of probabilistic phenomena that arise from conditioning. This topic has connections to several areas of probability, ergodic theory, measure theory, and statistical mechanics, as well as practical implications for the design and analysis of algorithms for data assimilation in complex systems. In this talk I will give an overview of some old and new problems in this setting, with emphasis on the behavior of conditional distributions and their approximations on long time intervals and in high dimension.

top of page

Multivariate Regression with Calibration

Lie Wang (MIT)

We propose a new method named calibrated multivariate regression (CMR) for fitting high dimensional multivariate regression models. Compared to existing methods, CMR calibrates the regularization for each regression task with respect to its noise level so that it is simultaneously tuning insensitive and achieves an improved finite sample performance. We also develop an efficient smoothed proximal gradient algorithm to implement it. Theoretically, it is proved that CMR achieves the optimal rate of convergence in parameter estimation. We illustrate the usefulness of CMR by thorough numerical simulations and show that CMR consistently outperforms existing multivariate regression methods. We also apply CMR on a brain activity prediction problem and find that CMR even outperforms the handcrafted models created by human experts.

top of page

Consistency of Co-clustering exchangeable graph data

David Choi (Heinz College, Carnegie Mellon University)

We analyze the problem of partitioning a 0-1 array or bipartite graph into subgroups (also known as co-clustering), under a relatively mild assumption that the data is generated by a general nonparametric process. This problem can be thought of as co-clustering under model misspecification; we show that the additional error due to misspecification can be bounded by O(n^(-1/4)). Our result suggests that under certain sparsity regimes, community detection algorithms may be robust to modeling assumptions, and that their usage is analogous to the usage of histograms in exploratory data analysis. The result also has connections to the recent literature on exchangeable graph models, graph limits, and graphons.

top of page

Semimartingale reflecting Brownian motions: tail asymptotics for stationary distributions

Jim Dai (Cornell University)

Multidimensional semimartingale reflecting Brownian motions (SRBMs) arise as the diffusion limits for stochastic networks. I will describe a powerful methodology to obtain the tail asymptotics of the stationary distribution of an SRBM. The methodology uses a moment generating function version of the basic adjoint relationship that characterizes the stationary distribution. The tail asymptotics can be used to predict quality of service in stochastic networks. It can also be used to speed up an algorithm, devised in Dai and Harrison (1992), to compute the stationary distribution. This talk is based on joint works with Masakio Miyazawa and Jian Wu.

top of page

Sublinear Optimization

Elad Hazan (Technion)

In many modern optimization problems, specifically those arising in machine learning, the amount data is too large to apply standard convex optimization methods. We'll discuss new optimization algorithms that make use of randomization to prune the data produce a correct solution albeit running in time which is smaller than the data representation, i.e. sublinear running time. We'll present such sublinear-time algorithms for linear classification, support vector machine training, semi-definite programming and other optimization problems. These new algorithms are based on a primal-dual approach, and use a combination of novel sampling techniques and the randomized implementation of online learning algorithms. We'll describe information-theoretic lower bounds that show our running times to be nearly best possible in the unit-cost RAM model.

top of page

Transitory Queueing Systems

Rahul Jain (University of Southern California)

"Transitory" queueing systems, namely those that exist only for finite time are all around us. And yet, current queueing theory provides with very few models and even fewer tools to understand them. In this talk, I'll introduce a model that may be more appropriate in "transitory" queueing situations. I'll start by first talking about the ``concert arrival game" that we introduced. This model captures the tradeoff in decision-making when users must decide whether to arrive early and wait in a queue, or arrive late but miss out on opportunities (e.g., best seats at a concert). With such decision-making by the users, it is a folly to assume that the arrival process is exogenous, and a well-behaved renewal process. Thus, under the fluid model, we characterize the equilibrium arrival profile and characterize the ``price of anarchy" (PoA). We also give some simple mechanisms to reduce it. We then present various extensions including to general feedforward networks when routing itself is strategic. The concert queueing game led us to the Delta(i)/GI/1 queue - a new model for transitory queueing systems, in which a each user picks their time of arrival from a common distribution. We derive the fluid and diffusion process limits for such a model. We find that the limiting queue length process is a reflection of a combination of a Brownian motion and a Brownian bridge. The convergence is established in the M1 topology, and we also show that it is not possible to obtain it in the stronger J1 topology. Such models allow us to study transitory queueing systems which are not very amenable to extant techniques in queueing theory.

top of page

Which side chooses in large random matching markets?

Yashodhan Kanoria (MSR New England and Columbia University)

We analyze large random two-sided matching markets with different numbers of men and women. We show that being on the short side of the market confers a large advantage, in a sense that the short side roughly chooses the matching while the long side roughly gets chosen. In addition, the matching is roughly the same under all stable matchings. Consider a market with n men and n+k women, for arbitrary 1 <= k<= n/2. We show that, with high probability, the men's average rank of wives is less than 5log(n/k) in any stable match. Further, men's average rank of wives in the women-optimal stable match (WOSM) is within a factor 1+\epsilon of that in the men-optimal stable match (MOSM). On the other hand, the women's average rank of husbands is at least n/(5 log(n/k)) in all stable matches, and is again very nearly the same in the WOSM and MOSM. Thus, our results imply that the core is likely to be "small". Simulations show that our the same features arise even in small markets. Joint work with Itai Ashlagi and Jacob D. Leshno

top of page

The Art of Gambling in a Team: Multi-Player Multi-Armed Bandits

Rahul Jain (University of Southern California)

Multi-Armed bandits are an elegant model of learning in an unknown and uncertain environment. Such models are relevant in many scenarios, and of late have received increased attention recently due to various problems of distributed control that have arisen in wireless networks, pricing models on the internet, etc. We consider a non-Bayesian multi-armed bandit setting proposed by Lai and Robbins in mid-80s. There are multiple arms each of which generates an i.i.d. reward from an unknown distribution. There are multiple players, each of whom is choosing which arm to play. If two or more players choose the same arm, they all get zero rewards. The problem is to design a learning algorithm to be used by the players that results in an orthogonal matching of players to arms (e.g., users to channels in wireless networks), and moreover minimizes the expected regret. We first consider this as an online bipartite matching problem. We model this combinatorial problem as a classical multi-armed bandit problem but with dependent arms, and propose an index-based learning algorithm that achieves logarithmic regret. From prior results, it is known that this is order-optimal. We then consider the distributed problem where players do not communicate or coordinate in any way. We propose an index-based algorithm that uses Bertsekas' auction mechanism to determine the bipartite matching. We show that the algorithm has expected regret at most near-log-squared. This is the first distributed multi-armed bandit learning algorithm in a general setting.

top of page

TBD

Rahul Mazumder (MIT)

Low-rank matrix regularization is an important area of research in statistics and machine learning with a wide range of applications. For example, in the context of Missing data imputation arising in recommender systems (e.g. the Netflix Prize), DNA microarray and audio signal processing, among others, the object of interest is a matrix (X) for which the training data comprises of a few noisy linear measurements. The task is to estimate X, under a low rank constraint and possibly additional affine constraints on X. In practice, the matrix dimensions frequently range from hundreds of thousands to even a million --- leading to severe computational challenges. In this talk, I will describe computationally tractable models and scalable (convex) optimization based algorithms for a class of low-rank regularized problems. Exploiting problem-specific statistical insights, problem structure and using novel tools for large scale SVD computations play important roles in this task. This is joint work with Trevor Hastie, Rob Tibshirani and Dennis Sun (Stanford Statistics).

top of page

Optimal detection of a sparse principal component

Philippe Rigollet (Princeton University)

Sparse Principal Component Analysis (SPCA) is a remarkably useful tool for practitioners who had been relying on ad-hoc thresholding methods. Our analysis aims at providing a a framework to test whether the data at hand indeed contains a sparse principal component. More precisely we propose an optimal test procedure to detect the presence of a sparse principal component in a high-dimensional covariance matrix. Our minimax optimal test is based on a sparse eigenvalue statistic. Alas, computing this test is known to be NP-complete in general and we describe a computationally efficient alternative test using convex relaxations. Our relaxation is also proved to detect sparse principal components at near optimal detection levels and performs very well on simulated datasets. Moreover, we exhibit some evidence from average case complexity that this slight deterioration may be unavoidable.

top of page

Growth of random surfaces

Alexei Borodin (Massachusetts Institute of Technology)

The goal of the talk is to describe a class of solvable random growth models of one and two-dimensional interfaces. The growth is local (distant parts of the interface grow independently), it has a smoothing mechanism (fractal boundaries do not appear), and the speed of growth depends on the local slope of the interface.

The models enjoy a rich algebraic structure that is reflected through closed determinantal formulas for the correlation functions. Large time asymptotic analysis of such formulas reveals asymptotic features of the emerging interface in different scales. Macroscopically, a deterministic limit shape phenomenon can be observed. Fluctuations around the limit shape range from universal laws of Random Matrix Theory to conformally invariant Gaussian processes in the plane. On the microscopic (lattice) scale, certain universal determinantal random point processes arise.

No preliminary knowledge of the material will be assumed.

top of page

The Coherence Phase Transition

Semen Shlosman (CNRS, France and Institute for Information Transmission Problems, Russia)

We study particle systems corresponding to highly connected queueing networks. We examine the validity of the so-called Poisson Hypothesis (PH), which predicts that the Markov process, describing the evolution of such particle system, started from a reasonable initial state, approaches the equilibrium in time independent of the size of the network.

This is indeed the case in many situations. However, there are networks for which the relaxation process slows down. This behavior reflects the fact that the corresponding infinite system undergoes a phase transition. It is characterized by the property that different nodes of the network start to evolve in a synchronous way.

Such transition can happen only when the load per node exceeds some critical value, while in the low load situation the PH behavior holds. The load thus plays here the same role as the inverse temperature in statistical mechanics.

We will mention a related open problem of ergodicity of interacting particle systems with unique stationary state.

The talk is based on joint works with A. Rybko and A. Vladimirov.

top of page

Mean-field Limit for General Queueing Networks on Infinite Graphs

Alexander Rybko (Institute for Information Transmission Problems, Russia)

We study the sequences of Markov processes defining the evolution of a general symmetric queueng network with the number of nodes tending to infinity. These sequences have limits which are nonlinear Markov processes. Time evolution of these nonlinear Markov processes are sometimes simpler and easier to analyze than the evolution of corresponding prelimit Markov processes on finite graphs. This fact give the possibility to find good approximation for the behavior of symmetric queueing networks on large graphs. Examples will be discussed.

The talk is based on a joint work with S. Shlosman.

top of page

Information theory of DNA sequencing

Guy Bresler (University of California, Berkeley)

DNA sequencing is the basic workhorse of modern biology and medicine. Shotgun sequencing is the dominant technique used: many randomly located short fragments called reads are extracted from the DNA sequence, and these reads are assembled to reconstruct the original DNA sequence. Despite major effort by the research community, the DNA sequencing problem remains unresolved from a practical perspective: it is currently not known how to produce a good assembly of most read data-sets.

By drawing an analogy between the DNA sequencing problem and the classic communication problem, we define an information theoretic notion of sequencing capacity. This is the maximum number of DNA base pairs that can be resolved reliably per read, and provides a fundamental limit to the performance that can be achieved by any assembly algorithm. We compute the sequencing capacity explicitly assuming an IID model for the DNA sequence and a noiseless read process. These basic results are then partially extended to arbitrary DNA sequences: the capacity is determined by a simple sufficient statistic of the sequence which can be computed for actual genomes.

The talk is based on joint works with Ma'ayan Bresler, Abolfazl Motahari, and David Tse.

Speaker Bio: Guy Bresler is currently a PhD candidate in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. Prior to that, he received the B.S. degree in electrical and computer engineering and the M.S. degree in mathematics from the University of Illinois at Urbana-Champaign, both in 2006. Guy is the recipient of an NSF Graduate Research Fellowship, a Vodafone Graduate Fellowship, the Barry M. Goldwater Scholarship, a Vodafone Undergraduate Scholarship, the E.C. Jordan Award from the ECE department at UIUC, and the Roberto Padovani Scholarship from Qualcomm. His research interests include information theory, applied probability, and computational biology.

top of page

Research Groups at Yandex and Moscow Institute of Physics and Technology

Andrei Raigorodksii (Yandex and Moscow Institute of Physics and Technology)

In my talk I will present our research groups in Yandex which is the main search engine in Russia, and Moscow Institute of Physics and Technology which is a kind of Russian MIT.

top of page

Web Graph Models and Their Applications

Andrei Raigorodksii (Yandex and Moscow Institute of Physics and Technology)

In the past 20 years, there has been a lot of work concerning modeling real networks such as Internet, WWW, social networks, biological networks, etc. In our talk, we will mainly concentrate on models devoted to incorporating properties of the web graph. We will give a survey of different such models including those of Bollob\'as--Riordan, Buckley--Osthus, Cooper--Frieze, etc. We will also present some new related mathematical results obtained by our group in Yandex, Moscow Institute of Physics and Technology, Moscow State University. Finally, we will discuss some applications of these results to ranking as well as some future research directions.

top of page

Conditional Coding with Limiting Computational Resources

Daniil Musatov (Yandex and Moscow Institute of Physics and Technology)

Consider the following situation. Alice knows string x and Bob knows string y. There is a limited capacity information channel from Alice to Bob. Alice wants to transmit a message that is sufficient for Bob to recover x. What is a minimum capacity that allows her to do it? Obviously, there is a theoretical minimum: the amount of information in x that is not contained in y. Astonishing enough, this is a precise estimate: Alice can produce an optimal code even though she does not know what information Bob already possesses. In Shannon entropy framework this fact is stated as Slepian-Wolf theorem. In Kolmogorov complexity framework (where both Alice and Bob use computable functions and logarithmic advice) it was proven by Muchnik. In the talk we will give another proof of Muchnik's result and extend it to space-bounded case, where Alice and Bob use space-bounded computations and advice is still logarithmic.

top of page

An Application of Talagrand's Inequality to Prove a Concentration of Second Degrees in Buckley-Osthus Random Graph Model

Liudmila Ostroumova (Yandex and Moscow Institute of Physics and Technology)

We consider the random graph model $H_{a,m}^n$. An explicit construction of this model was suggested by Buckley and Osthus. For any fixed positive integer $m$ and positive real $a$ we define a random graph $H_{a,m}^n$ in the following way. The graph $H_{a,1}^1$ consists of one vertex and one loop. Given a graph $H_{a,1}^{t-1}$ we transform it into $H_{a,1}^{t}$ by adding one new vertex and one new random edge $(t,i)$, where $i \in \{1, \dots, t \}$ and $\Prob(i=t) = \frac{a}{(a+1)t-1}$, $\Prob(i=s) = \frac{d_{H_{a,1}^{t-1}}(s) - 1 + a}{(a+1)t-1}$ if $1 \le s \le t-1$. To construct $H_{a,m}^n$ with $m>1$ we start from $H_{a,1}^{mn}$. Then we identify the vertices $1, \dots, m$ to form the first vertex; we identify the vertices $m+1, \dots, 2m$ to form the second vertex, etc.

The Buckley and Osthus random graph is a model of real-world networks. The parameter $a$ is called an ``initial attractiveness'' of a vertex. It was shown (by Buckley and Osthus for integer $a$ and by Grechnikov for real $a$) that the degree sequence in this model follows a power law distribution with exponent $-2-a$. We study second degrees of vertices in $H_{a,m}^n$. By the second degree of $t$ we mean the number of edges adjacent to the neighbors of $t$ except for the edges adjacent to the vertex $t$. Second degree of a vertex is an approximation of the PageRank. We show that the distribution of second degrees in $H_{a,1}^n$ follows a power law with exponent $-1-a$.

Proof of this result consists of two parts. The first step is to calculate the expectation of the number of vertices with second degree $k$ and the second step is to obtain a concentration. I will talk about the second step. It is a non-trivial application of Talagrand's inequality. Instead of Talagrand's inequality it is possible to apply Azuma's inequality, but the result will be weaker.

top of page

Van der Warden Number and Coloring of Hypergraphs with Large Girth

Dmitry Shabanov (Yandex and Moscow Institute of Physics and Technology)

WThe talk is devoted to the classical problem of estimating the Van der Waerden number $W(n,r)$. The famous Van der Waerden theorem states that, for any integers $n\ge 3$, $r\ge 2$, there exists the smallest integer $W(n,r)$ such that, for every $N\ge W(n,r)$, in any $r$-coloring of the set $\{1,2,\ldots,N\}$ there exists a monochromatic arithmetic progression of length $n$. The best upper bound for $W(n,r)$ in the general case was obtained by W.T. Gowers in 2001 who proved that $$ W(n,r)\le 2^{2^{r^{2^{2^{n+9}}}}}.$$ Our talk is concentrated on the lower bounds for the Van der Waerden number. We shall show that lower estimates for $W(n,r)$ are closely connected with extremal problems concerning colorings of uniform hypergraphs with large girth. Our main result is a new lower bound for $W(n,r)$: $$W(n,r)\ge c\,\frac {r^{n-1}}{\ln n},$$ where $c>0$ is an absolute constant. The proof is based on a continuous-time random recoloring process.

top of page

Delay optimality in switched networks (paper)

Yuan Zhong (MIT ORC)

We consider a switched (queueing) network in which there are constraints on which queues may be served simultaneously; such networks have been used to effectively model input-queued switches and wireless networks. The scheduling policy for such a network specifies which queues to serve at any point in time, based on the current state or past history of the system. In the main result, we provide a new class of online scheduling policies that achieve optimal average queue-size scaling for a class of switched networks including input-queued switches. In particular, it establishes the validity of a conjecture about optimal queue-size scaling for input-queued switches.

Speaker Bio: Yuan Zhong is a doctoral candidate in the Operations Research Center at MIT, under the supervision of Devavrat Shah and John Tsitsiklis. He has a BA degree in Mathematics from University of Cambridge, and a MA degree in Mathematics from California Institute of Technology. He is broadly interested in stochastic modeling and analysis, as well as their applications in various business and engineering domains.
top of page

Does god play dice? An I/O scheduling and airplane boarding perspective.

Eitan Bachmat (Ben-Gurion University)

Given N I/O requests to a disk drive, how long will it take the disk drive to service them? How long does it take to board an airplane? We will show that such questions are modeled by the same math that models the universe in relativity theory, namely, space-time (Lorentzian) geometry.As a result of this connection we will try to understand what is the best way to service I/O, what is the best airplane boarding policy and what makes free falling bodies take the trajectories that they do. On the way we may also encounter a silly card game and random matrices. The talk is self contained, experience with boarding an airplane is helpful.

Speaker Bio: Eitan Bachmat is a senior lecturer at the Computer Science department of Ben-Gurion U. He received his PhD in mathematics from M.I.T. He works mostly on the design of storage systems, performance analysis and applied probability. He also consults to the storage industry and participated in the development of several commercially available systems. Consequently, he has more patents than academic papers.
top of page

Asymptotics for preferential attachment random graphs via Stein's method (paper)

Erol Peköz (Boston University)

Preferential attachment random graphs evolve in time by sequentially adding vertices and edges randomly so that connections to vertices having high degree are favored. There has been an explosion of research on these models since Barabasi and Albert popularized them in 1999 to explain the so-called power law behavior observed in real networks such as the graph of the Internet where Web pages correspond to vertices and hyperlinks between them correspond to edges. In this talk we will show how to derive rates of convergence for the distribution of the degree of a fixed vertex to its distributional limit using a new variation of Stein's method that relies on finding couplings to motivate distributional transformations for which limit distributions are fixed points. Surprisingly little is known about these limit distributions despite the large literature on such graphs.
top of page

On Resolution, Sparse Signal Recovery, and Random Access Communication (abstract)

Vivek Goyal (MIT EECS)

Resolution of a data acquisition system is usually thought to be determined completely by sensing properties, such as density and accuracy of measurements. This talk advocates the view that resolution is, in addition, dependent on signal modeling and the complexity of computations allowed. This view is developed concretely for the acquisition of sparse signals, where the asymptotic relationship between resolution and the number of measurements is studied for algorithms with various complexities. The sparse signal recovery problem corresponds perfectly to a model of random multiple access communication in which the task of the receiver is to determine only the subset of the users that transmitted. This connection gives insight on the performance of single- and multi-user detection algorithms. Specifically, all known practical multiuser detection techniques become interference limited at high SNR. However, power control is natural in the multiple access setting, and we are able to prove that a simple detection algorithm based on orthogonal matching pursuit is not interference limited under optimal power control. The talk is based on joint work with Alyson Fletcher and Sundeep Rangan.
top of page

The Smoothed Linear Program for Approximate Dynamic Programming (abstract)

Vivek F. Farias (MIT Sloan)

We present a novel linear program for the approximation of the dynamic programming value function in high-dimensional stochastic control problems. LP approaches to approximate DP naturally restrict attention to approximations that, depending on the context, are upper or lower bounds to the optimal value function. Our program -- the `smoothed LP' -- relaxes this restriction in an appropriate fashion while remaining computationally tractable. Doing so appears to have several advantages: o We demonstrate superior bounds on the quality of approximation to the optimal value function afforded by our approach. o Experiments with our approach on a challenging problem (the game of Tetris) show that the approach outperforms a variety of approximate DP algorithms (including the LP approach, TD-learning and policy gradient methods) by a significant margin. Joint work with Vijay Desai and Ciamac Moallemi (Columbia).
top of page

Fractional simple random walk (abstract)

Scott Sheffield (MIT Math)

Fractional Brownian motions are the most natural generalizations of ordinary (one-dimensional) Brownian motions that allow for some amount of long range dependence (a.k.a. "momentum effects"). They are sometimes used in mathematical finance as models for logarithmic asset prices. We describe some natural random simple walks on the integers that have fractional Brownian motion as a scaling limit. In a sense, these walks are the most natural discrete analogs of fractional Brownian motion.
top of page

A New Look at the Compound Poisson Distribution and Compound Poisson Approximation using Entropy (abstract)

Mokshay Madiman (Yale University)

We develop an information-theoretic foundation for compound Poisson approximation and limit theorems (analogous to the corresponding developments for the central limit theorem and for simple Poisson approximation). First, su?cient conditions are given under which the compound Poisson distribution has maximal entropy within a natural class of probability measures on the nonnegative integers. In particular, it is shown that a maximum entropy property is valid if the measures under consideration are log-concave, but that it fails in general. Second, approximation bounds in the (strong) relative entropy sense are given for distributional approximation of sums of independent nonnegative integer valued random variables by compound Poisson distributions. The proof techniques involve the use of a notion of local information quantities that generalize the classical Fisher information used for normal approximation, as well as the use of ingredients from Stein's method for compound Poisson approximation. This work is joint with Andrew Barbour (Zurich), Oliver Johnson (Bristol) and Ioannis Kontoyiannis (AUEB).
top of page

Sequential algorithms for generating random graphs (abstract)

Mohsen Bayati (Microsoft Research New England)

Large-scale complex networks have been the objects of study for the past two decades, and one central problem have been constructing or designing realistic models for such networks. This problem appears in a variety of applications including coding theory and systems biology. Unfortunately, the existing algorithms for this problem have large running times, making them impractical to use for networks with millions of links. We will present a technique to design an almost linear time algorithm for generating random graphs with given degree sequences for a range of degrees. Next, we extend the analysis to design an algorithm for efficiently generating random graphs without short cycles. These algorithms can be used for constructing high performance low-density parity-check codes (LDPC codes). Based on joint works with Jeong Han Kim, Andrea Montanari, and Amin Saberi.
top of page

Convergence of unitary matrix integrals (abstract)

Benoît Collins (University of Ottawa)

We introduce the unitary Schwinger-Dyson equation associated to a selfadjoint polynomial potential V. The V=0 case corresponds to the free product state, so the Schwinger-Dyson equation can be considered as a deformation of free probability. We show that the solutions of this equation are unique for small V's and correspond to a large N limit of a multi-matrix model. This technique allows to show that a wide class of unitary and orthogonal multi-matrix models converge asymptotically. We also give a combinatorial and analytic description of the limit, solving a series of open questions raised in theoretical physics in the late 70's. This is joint work with A. Guionnet and E. Maurel-Segala.
top of page

The Gaussian random walk, sampling Brownian motion, and the Riemann zeta function (abstract)

Johan van Leeuwaarden (Eindhoven University of Technology, EURANDOM, NYU Stern)

We consider the Gaussian random walk (one-dimensional random walk with normally distributed increments), and in particular the moments of its maximum M. Explicit expressions for all moments of M are derived in terms of Taylor series with coefficients that involve the Riemann zeta function. We build upon the work of Chang and Peres (1997) on P(M=0) and Bateman's formulas on Lerch's transcendent. Our result for E(M) completes earlier work of Kingman (1965), Siegmund (1985), and Chang and Peres (1997). The maximum M shows up in a range of applications, such as sequentially testing for the drift of a Brownian motion, corrected diffusion approximations, simulation of Brownian motion, option pricing, thermodynamics of a polymer chain, and queueing systems in heavy traffic. Some of these applications are discussed, as well as several issues open for further research. This talk is based on joint work with A.J.E.M. Janssen.
top of page

Quantum Computation by Adiabatic Evolution (abstract)

Ed Farhi (MIT Physics)

The quantum adiabatic algorithm is a general approach to solving combinatorial search problems using a quantum computer. It will succeed if the run time is long enough. I will discuss how this algorithm works and our current understanding of the required run time.
top of page

Large deviations for random walks under subexponentiality: the big-jump domain (abstract)(slides)

Ton Dieker (Georgia Tech I&SE)

Stimulated by applications to internet traffic modeling and insurance mathematics, distributions with heavy tails have been widely studied over the past decades. This talk addresses a fundamental large-deviation problem for random walks with heavy-tailed step-size distributions. We consider so-called subexponential step-size distributions, which constitute the most widely-used class of heavy-tailed distributions. I will present a theory to study sequences {x_n} for which P{S_n>x_n} behaves asymptotically like n P {S_1>x_n} for large n. (joint work with D. Denisov and V. Shneer)
top of page

Bounds on Stationary Expectations for Markov Processes (abstract)

Peter Glynn (Stanford MS&E)

Many performance engineering and operations research modeling formulations lead to Markov models in which the key performance measure is an expectation defined in terms of the stationary distribution of the process. In models of realistic complexity, it is often difficult to compute such expectations in closed form. In this talk, we will discuss a simple class of bounds for such stationary expectations, and describe some of the mathematical subtleties that arise in making rigorous such bounds. We will also discuss how such bounds can be used algorithmically. This work is joint with Denis Saure and Assaf Zeevi.
top of page

Fragility of Asymptotic Agreement under Bayesian Learning (abstract)(slides)

Daron Acemoglu (MIT Econ)

Under the assumption that individuals know the conditional distributions of signals given the payoff-relevant parameters, existing results conclude that, as individuals observe infinitely many signals, their beliefs about the parameters will eventually merge. We first show that these results are fragile when individuals are uncertain about the signal distributions: given any such model, a vanishingly small individual uncertainty about the signal distributions can lead to a substantial (non-vanishing) amount of differences between the asymptotic beliefs. We then characterize the conditions under which a small amount of uncertainty leads only to a small amount of asymptotic disagreement. According to our characterization, this is the case if the uncertainty about the signal distributions is generated by a family with "rapidly-varying tails" (such as the normal or the exponential distributions). However, when this family has "regularly-varying tails" (such as the Pareto, the log-normal, and the t-distributions), a small amount of uncertainty leads to a substantial amount of asymptotic disagreement.
top of page

Quantile and Probability Curves without Crossing (abstract)(slides)

Victor Chernozhukov (MIT Econ)

The most common approach to estimating conditional quantile curves is to fit a curve, typically linear, pointwise for each quantile. Linear functional forms, coupled with pointwise fitting, are used for a number of reasons including parsimony of the resulting approximations and good computational properties. The resulting fits, however, may not respect a logical monotonicity requirement -- that the quantile curve be increasing as a function of probability. This paper studies the natural monotonization of these empirical curves induced by sampling from the estimated non-monotone model, and then taking the resulting conditional quantile curves that by construction are monotone in the probability. This construction of monotone quantile curves may be seen as a bootstrap and also as a monotonic rearrangement of the original non-monotone function. It is shown that the monotonized curves are closer to the true curves in finite samples, for any sample size. Under correct specification, the rearranged conditional quantile curves have the same asymptotic distribution as the original non-monotone curves. Under misspecification, however, the asymptotics of the rearranged curves may partially differ from the asymptotics of the original non-monotone curves. An analogous procedure is developed to monotonize the estimates of conditional distribution functions. The results are derived by establishing the compact (Hadamard) differentiability of the monotonized quantile and probability curves with respect to the original curves in discontinuous directions, tangentially to a set of continuous functions. In doing so, the compact differentiability of the rearrangement-related operators is established.
top of page

Exponential Error Bounds for Random Codes on the BSC (abstract)

David Forney (MIT LIDS)

One of Shannon's earliest results was his determination of the capacity of the binary symmetric channel (BSC). Shannon went on to show that, with randomly chosen codes and optimal decoding, the probability of decoding error decreases exponentially for any transmission rate less than capacity. Much of the important early work of Shannon, Elias, Fano and Gallager was devoted to determining bounds on the corresponding "error exponent." A later approach to this problem, pioneered by Csiszar and Korner, and now adopted in many modern information theory courses such as 6.441, determines error exponents using large-deviation theory. This approach has several advantages: (a) It can be remarkably simple and intuitive; (b) It gives correct error exponents, not just bounds; (c) It gives insight into how decoding errors typically occur. We illustrate this approach by developing the correct error exponents at all rates for both random codes and random linear codes on the BSC, assuming optimal decoding. We discuss how decoding errors typically occur. In particular, we show why the classical algebraic coding approach never had any hope of approaching channel capacity, even on the BSC. This talk is intended to be pitched at an elementary and tutorial level.
top of page

ORC

LIDS

Statistics@MIT

Maps & Directions