Linear Regression with Many Included Covariates


Whitney Newey (MIT Economics)
We consider asymptotic inference for linear regression coefficients when the number of included covariates grows as fast as the sample size. We find a limiting normal distribution with asymptotic variance that is larger than the usual one. We also find that all of the usual versions of heteroskedasticity consistent standard error estimators are inconsistent under this asymptotics. The problem with these standard errors is that they do not make a correct "degrees of freedom" adjustment. We propose a new heteroskedasticity consistent standard error formula that is fully automatic and robust to both (conditional) heteroskedasticity of unknown form and the inclusion of many covariates. We illustrate our findings in three distinct settings: (i) parametric linear models with many covariates, (ii) semiparametric semilinear models with many technical regressors, and (iii) linear panel models with many fixed effects.
top of page

Sparse Canonical Correlation Analysis: Minimaxity and Adaptivity


Harrison Huibin Zhou (Yale University)
Canonical correlation analysis is a widely used multivariate statistical technique for exploring the relation between two sets of variables. In this talk we consider the problem of estimating the leading canonical correlation directions in high dimensional settings. Recently, under the assumption that the leading canonical correlation directions are sparse, various procedures have been proposed for many high dimensional applications involving massive data sets. However, there has been few theoretical justification available in the literature. In this talk, we establish rateoptimal nonasymptotic minimax estimation with respect to an appropriate loss function for a wide range of model spaces. Two interesting phenomena are observed. First, the minimax rates are not affected by the presence of nuisance parameters, namely the covariance matrices of the two sets of random variables, though they need to be estimated in the canonical correlation analysis problem. Second, we allow the presence of the residual canonical correlation directions. However, they do not influence the minimax rates under a mild condition on eigengap. A generalized sintheta theorem and an empirical process bound for Gaussian quadratic forms under rank constraint are used to establish the minimax upper bounds, which may be of independent interest.
If time permits, we will discuss a computationally efficient twostage estimation procedure which consists of a convex programming based initialization stage and a group Lasso based refinement stage, and show some encouraging numerical results on simulated data sets and a breast cancer data set.
top of page

Optimal stochastic transport


Alfred Galichon (Sciences Po, Paris)
We explore the link between the MongeKantorovich problem and the Skorohod embedding problem. This question arises in particular in Mathematical Finance when seeking modelfree bounds on some option prices when the marginal distributions of the underlying at various maturities are implied by European options prices. We provide a stochastic control approach which we connect to several important constructions. Finally we revisit in this light the celebrated AzémaYor solution of the Skorohod embedding problem.
This talk is based on joint works with Guillaume Carlier, Pierre HenryLabordère and Nizar Touzi.
top of page

Beyond Berry Esseen: Structure and Learning of Sums of Random Variables


Constantinos Daskalakis (MIT EECS)
The celebrated BerryEsseen theorem, and its variants, provide a useful approximation to the sum of independent random variables by a Gaussian. In this talk, I will restrict attention to the important case of sums of integer random variables, arguing that BerryEsseen theorems fall short from characterizing their general structure. I will offer stronger finitary central limit theorems, tightly characterizing the structure of these distributions, and show their implications to learning. In particular, I will present algorithms that can learn sums of integer random variables from a number of samples that is independent of how many variables are summed over.
top of page

High Dimensional Covariance Matrix Estimations and Factor Models


Yuan Liao (University of Maryland)
Large covariance matrix estimation is crucial for highdimensional statistical inferences, and has also played an central role in factor analysis. Applications are found in analyzing financial risks, climate data, genomic data and PCA, etc. Commonly used approaches to estimating large covariances include shrinkages and sparse modeling. This talk will present new theoretical results on estimating large (inverse) covariance matrices under large N large T asymptotics, with a focus on the roles it plays in statistical inferences for large panel data and factor models. The application examples will include efficient estimation for factor analysis and portfolio allocations under large assets.
top of page

A Geometric Approach to Weakly Identified Econometric Models


Anna Mikusheva (MIT Economics)
Many nonlinear Econometric models show evidence of weak identification. In this paper
we consider minimum distance statistics and show that in a broad class of models the problem
of testing under weak identification is closely related to the problem of testing a curved null
in a finitesample Gaussian model. Using the curvature of the model, we develop new finitesample bounds on the distribution of minimumdistance statistics, which we show can be used
to detect weak identification and to construct tests robust to weak identification. We apply our
new method to new Keynesian Phillips curve and DSGE examples and show that it provides a
significant improvement over existing approaches.
top of page

Asymptotics and concentration for sample covariance


Vladimir Koltchinskii (Georgia Tech)
We will discuss recent moment bounds and concentration inequalities
for sample covariance operators based on a sample of n i.i.d. Gaussian random
variables taking values in an infinite dimensional space. These bounds show
that the size of the operator norm of the deviation of sample covariance from
the true covariance can be completely characterized by two parameters:
the operator norm of the true covariance and its so called "effective rank".
These results rely on Talagrand's generic chaining bounds and on Gaussian
concentration. We then discuss several asymptotic and concentration results
for spectral projectors of sample covariance in the case when the "effective
rank" is large, but it is smaller than the sample size. These results include, in particular, asymptotic normality of bilinear forms of
empirical spectral projectors and of squared HilbertSchmidt norms of
their deviations from the spectral projectors of the true covariance.
Most of the results are joint with Karim Lounici
top of page

Uncertainty quantification and confidence sets in highdimensional models


Richard Nickl (University of Cambridge)
While much attention has been paid recently to the construction of optimal algorithms that adaptively estimate lowdimensional parameters (described by sparsity, lowrank, or smoothness) in highdimensional models, the theory of statistical inference and uncertainty quantification (in particular hypothesis tests & confidence sets) is much less welldeveloped. We will discuss some perhaps surprising impossibility results in the basic highdimensional compressed sensing model, and some of the recently remerging positive results in the area.
top of page

Superposition codes and approximatemessagepassing decoder


Florent Krzakala (Université Pierre et Marie)
Superposition codes are asymptotically capacity achieving scheme for the Additive White Gaussian Noise channel. I will first show how a practical iterative decoder can be built based on a Belief Propagation type approach, closely related to the one performed in compressed sensing and sparse estimation problems. Secondly, I will show how the idea of spatial coupling in this context allows to built efficient and practical capacity achieving coding and decoding schemes. The links between the present problem, sparse estimations, and nonparametric belief propagation for continuous variables will be also discussed.
References:
Replica Analysis and Approximate Message Passing Decoder for Superposition Codes, J. Barbier and F. Krzakala, ISIT 2014.
Statistical physicsbased reconstruction in compressed sensing, F. Krzakala et al. Phys. Rev. X 2, 021005 (2012)
top of page

Clustering of sparse networks: Phase transitions and optimal algorithms


Lenka Zdeborova (CEA)
A central problem in analyzing networks is partitioning them into modules or communities, clusters with a statistically homogeneous pattern of links to each other or to the rest of the network. A principled approach to address this problem is to fit the network on a stochastic block model, this task is, however, intractable exactly. In this talk we discuss application of belief propagation algorithm to module detection. In the first part we present an asymptotically exact analysis of the stochastic block model. We quantify properties of the detectability/undetectability phase transition and the easy/hard phase transition for the module detection problem. In the second part we introduce a new class of spectral algorithms based on a nonbacktracking walk on the directed edges of the graph. We show that our algorithm is optimal for graphs generated by the stochastic block model, detecting communities all the way down to the theoretical limit. We also show the spectrum of the nonbacktracking operator for some realworld networks, illustrating its advantages over traditional spectral clustering.
References:
Phase transition in the detection of modules in sparse networks;
Aurelien Decelle, Florent Krzakala, Cristopher Moore, Lenka Zdeborová; Phys. Rev. Lett. 107, 065701 (2011).
Spectral redemption: clustering sparse networks;
Florent Krzakala, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborová, Pan Zhang; Proceedings of the National Academy of Sciences 110, no. 52 (2013): 2093520940
top of page

RegressionRobust Designs of Controlled Experiments


Nathan Kallus (MIT)
Achieving balance between experimental groups is a cornerstone of causal inference. Without balance any observed difference may be attributed to a difference other than the treatment alone. In controlled/clinical trials, where the experimenter controls the administration of treatment, complete randomization of subjects has been the golden standard for achieving this balance because it allows for unbiased and consistent estimation and inference in the absence of any a priori knowledge or measurements. However, since estimator variance under complete randomization may be slow to converge, experimental designs that balance pretreatment measurements (baseline covariates) are in pervasive use, including randomized block designs, pairwisematched designs, and rerandomization. We formally argue that absolutely no balance better than complete randomization's can be achieved without partial structural knowledge about the treatment effects. Therefore, that balancing designs are in popular use and are advocated means that partial knowledge is in fact available to the researcher, just as one would expect from a researcher conducting an experiment in her own domain of expertise. We propose a novel framework for formulating such knowledge using functional analysis that subsumes all of the aforementioned designs in that it recovers them as optimal under certain modeling choices, thus theoretically characterizing their comparative power under different assumptions and suggesting extensions of these to multiarm trials. Furthermore, it suggests new optimal designs that are based on more robust nonparametric modeling and that offer extensive gains in precision and power. In certain cases we are able to argue linear convergence O(1/2^n) to the sample average treatment effect (as compared to the usual logarithmic convergence O(1/n)). We characterize generally the unbiasedness, variance, and consistency of resulting estimators; solve the design problem; and develop appropriate inferential algorithms for Fisherian and Neymanian hypotheses. We uncover connections to Bayesian experimental design and make extensions to dealing with noncompliance.
top of page

Uniform Post Selection Inference for Zestimation problems


Alex Belloni (Duke University)
In this talk we will consider inference with high dimensional data. We propose new methods for estimating and constructing confidence regions for a regression parameter of primary interest alpha_0, a parameter in front of the regressor of interest, such as the treatment variable or a policy variable. We show how to apply these methods to Zestimators (for example, logistic regression and quantile regression). These methods allow to estimate alpha_0 at the rootn rate when the total number p of other regressors, called controls, exceed the sample size n, using the sparsity assumptions. The sparsity assumption means that only s unknown controls are needed to accurately approximate the nuisance part of the regression function, where s is smaller than n. Importantly, the estimators and these resulting confidence regions are ``honest" in the formal sense that their properties hold uniformly over ssparse models. Moreover, these procedures do not rely on traditional ``consistent model selection" arguments for their validity; in fact, they are robust with respect to ``moderate" model selection mistakes in variable selection steps. Furthermore, the estimators are semiparametrically efficient in the sense of attaining the semiparametric efficiency bounds for the class of models in this paper.
top of page

DePreferential Attachment Random Graphs


Antar Bandyopadhyay (University of California, Berkeley)
In this talk we will introduce a new model of a growing sequence of random graphs where a new vertex is less likely to join to an existing vertex with high degree and more likely to join to a vertex with low degree. In contrast to the well studied model of preferential attachment random graphs where higher degree vertices are preferred, we will call our model depreferential attachment random graph model.
We will consider two types of depreferential attachment models, namely, inverse depreferential, where the attachment probabilities are inversely proportional to the degree and linear depreferential, where the attachment probabilities are proportional to \(c\)degree, where \(c > 0\) is a constant. We will give asymptotic degree distribution for both the models and show that the limiting degree distribution has very thin tail. We will also show that for a fixed vertex \(v\), the degree grows as \(\sqrt{\log n}\) for the inverse depreferential case and as \(\log n\) for the linear case. Some of the results will also be generalized when each new vertex joins to \(m > 1\) existing vertices.
This is a joint work with Subhabrata Sen, Stanford University.
top of page

Computationally and Statistically Efficient Estimation in HighDimensions


Sahand Negahban (Yale University)
Modern techniques in data accumulation and sensing have led to an explosion in both the volume and variety of data. Many of the resulting estimation problems are highdimensional, meaning that the number of parameters to estimate can be far greater than the number of examples. A major focus of my work has been developing an understanding of how hidden lowcomplexity structure in large datasets can be used to develop computationally efficient estimation methods. I will discuss a framework for establishing the error behavior of a broad class of estimators under highdimensional scaling. I will then discuss how to compute these estimates and draw connections between the statistical and computational properties of our methods. Interestingly, the same tools used to establish good highdimensional estimation performance have a direct impact for optimization: better conditioned statistical problems lead to more efficient computational methods.
top of page

Avoiding Outliers


Joel Spencer (Courant Institute, New York University)
Given \(n\) vectors \(r_j\) in \(n\)space
with all coefficients in \([1,+1]\) one wants a
vector \(x=(x_1,...,x_n)\) with all \(x_i=+1\) or \(1\) so
that all dot products \(x\cdot r_j\) are at most \(K\sqrt{n}\)
in absolute value, \(K\) an absolute constant. A random
\(x\) would make \(x\cdot r_j\) roughly Gaussian but there would
be outliers. The existence of such an \(x\) was first shown
by the speaker, resolving a discrepancy question of
Paul Erdős. However, the original argument did not
give an effective algorithm. The first algorithm was found by Nikhil
Bansal. The approach here, due to Lovett and Meka, is to
begin with \(x=(0,...,0)\) and let it float in a kind
of restricted Brownian Motion until all the coordinates hit
the boundary. There is some nice probability using multidimensional
Gaussians, together with a martingale result on a “Gaussian Casino.”
Paul Erdős was an extraordinary twentieth century mathematician
with a unique life style and the speaker will include personal
reminiscences of him.
top of page

On the influence of the seed graph in the preferential attachment model


Sébastien Bubeck (Princeton University)
We are interested in the following question: suppose we generate a large graph according to the linear preferential attachment modelcan we say anything about the initial (seed) graph? A precise answer to this question could lead to new insights for the diverse applications of the preferential attachment model. In this work we focus on the case of trees grown according to the preferential attachment model. We first show that the seed has no effect from a weak local limit point of view. On the other hand, we conjecture that different seeds lead to different distributions of limiting trees from a total variation point of view. We take some steps in proving this conjecture by focusing on star seeds.
top of page

Learning and estimation: separated at birth, reunited at last


Alexander Rakhlin (University of Pennsylvania, The Wharton School)
Abstract: We consider the problem of regression in three scenarios:
(a) random design under the assumption that the model F is correctly
specified, (b) distributionfree statistical learning with respect to
a reference class F; and (c) online regression with no assumption on
the generative process. The first problem is often studied in the
literature on nonparametric estimation, the second falls within the
purview of statistical learning theory, and the third is studied
within the online learning community. It is recognized that complexity
of the class F plays the key role in determining the minimax behavior:
the importance of entropy in the study of estimation goes back to Le
Cam, Ibragimov and Khas'minskii, and Birge; within the setting of
statistical learning the importance of entropy was established in the
work of Vapnik and Chervonenkis and in subsequent works on uniform LLN
within empirical process theory. The corresponding complexities for
online learning have only been found in the past few years. But do
these three problems really differ from the minimax point of view? The
question, which boils down to understanding wellspecified and
misspecified models, will be addressed in this talk.
Joint work with Karthik Sridharan and Sasha Tsybakov
top of page

Linear and Conic Programming Approaches to HighDimensional Errorsinvariables Models


Alexandre Tsybakov (CRESTENSAE)
We consider the regression model with observation error in the design
when the dimension can be much larger than the sample size and the true parameter is sparse. We propose two new estimators, based on linear and conic programming, and we prove that they satisfy oracle inequalities similar to those for the model with exactly known covariates. The only difference is that they contain additional scaling with the \(l_1\) or \(l_2\) norm of the true parameter. The scaling with the \(l_2\) norm is minimax optimal and it is achieved on conic programming, while the scaling with the \(l_1\) norm is achieved on the linear programming estimator, which is easier to implement.
This is a joint work with Mathieu Rosenbaum.
top of page

Integer Feasibility of Random Polytopes


Karthekeyan Chandrasekaran (Harvard)
Optimization problems are ubiquitous in contemporary engineering. A principal barrier to solving several realworld optimization problems is input uncertainty. In this talk, I will present new tools to study probabilistic instances of integer programs. As an application, I will show a phasetransition phenomenon in a simple distribution model for random integer programs. Our main tool is an elementary connection between integer programming and matrix discrepancy. I will describe this connection and derive matching upper and lower bounds on the discrepancy of random Gaussian matrices.
Based on joint work with Santosh Vempala.
top of page

On the Power of Adversarial Infections in Networks


Michael Brautbar (MIT)
Over the last decade we have witnessed the rapid proliferation of online networks and Internet activity. Such activity is considered as a blessing but it brings with it a large increase in risk of computer malware  malignant software that actively spreads from one computer to another. To date, the majority of existing models of malware spread use stochastic behavior, where the set of neighbors infected from the current set of infected nodes is chosen obliviously. In this work, we initiate the study of adversarial infection strategies which can decide intelligently which neighbors of infected nodes to infect next in order to maximize their spread, while maintaining a similar ``signature'' as the oblivious stochastic infection strategy as not to be discovered. We first establish that computing an optimal and nearoptimal adversarial strategies is computational hard. We then identify necessary and sufficient conditions in terms of network structure and edge infection probabilities such that the adversarial process can infect polynomially more nodes than the stochastic process while maintaining a similar ``signature'' as the oblivious stochastic infection strategy. Among our results is a surprising connection between an additional structural quantity of interest in a network, the network toughness, and adversarial infections. Based on the network toughness, we characterize networks where existence of adversarial strategies that are pandemic (infect all nodes) is guaranteed, as well as efficiently computable.
Based on joint work with Moez Draief and Sanjeev Khanna.
top of page

Degreedegree dependencies in random graphs with heavytailed degrees


Nelly Litvak (University of Twente)
Mixing patterns in large selforganizing networks, such as the Internet, the World Wide Web, social and biological networks are often characterized by degreedegree dependencies between neighboring nodes. In assortative networks the degreedegree dependencies are positive (nodes with similar degrees tend to connect to each other), while in disassortative networks these dependencies are negative. One of the problems with the commonly used Pearson correlation coefficient, also known as the assortativity coefficient, is that its magnitude decreases with the network size in disassortative networks. This makes it impossible to compare mixing patterns, for example, in two web crawls of different sizes.
In this talk we analytically investigate degreedegree dependencies for scalefree graph sequences. In particular, we show that the Pearson's correlation coefficient is nonnegative in the large graph limit when the asymptotic degree distribution has an infinite third moment. Furthermore, we provide examples where the Pearson's correlation coefficient converges to zero in a network with strong negative degreedegree dependencies, and another example where this coefficient converges in distribution to a random variable. We suggest the alternative degreedegree dependency measure, based on Spearman's rho, and prove that this statistical estimator converges to an appropriate limit under quite general conditions. These conditions are proved to hold in common network models, such as the configuration model and the preferential attachment model.
top of page

Conditional Phenomena in Time and Space


Ramon van Handel (Princeton University)
Random phenomena are ubiquitous throughout science and engineering. Beyond the study of stochastic models in their own right, it is of importance in many applications to understand what information can be extracted on the basis of observed data. Mathematically, such ``data assimilation'' problems motivate the investigation of probabilistic phenomena that arise from conditioning. This topic has connections to several areas of probability, ergodic theory, measure theory, and statistical mechanics, as well as practical implications for the design and analysis of algorithms for data assimilation in complex systems. In this talk I will give an overview of some old and new problems in this setting, with emphasis on the behavior of conditional distributions and their approximations on long time intervals and in high dimension.
top of page

Multivariate Regression with Calibration


Lie Wang (MIT)
We propose a new method named calibrated multivariate
regression
(CMR) for fitting high dimensional multivariate regression models.
Compared
to existing methods, CMR calibrates the regularization for each regression task with respect to its noise level so that it is simultaneously tuning insensitive and achieves an improved finite sample performance. We also develop an efficient smoothed proximal gradient algorithm to implement it. Theoretically, it is proved that CMR achieves the optimal rate of convergence in parameter estimation. We illustrate the usefulness of CMR by thorough numerical simulations and show that CMR consistently outperforms existing multivariate regression methods. We also apply CMR on a brain activity prediction problem and find that CMR even outperforms the handcrafted models created by human experts.
top of page

Consistency of Coclustering exchangeable graph data


David Choi (Heinz College, Carnegie Mellon University)
We analyze the problem of partitioning a 01 array or bipartite graph into subgroups (also known as coclustering), under a relatively mild assumption that the data is generated by a general nonparametric process. This problem can be thought of as coclustering under model misspecification; we show that the additional error due to misspecification can be bounded by O(n^(1/4)). Our result suggests that under certain sparsity regimes, community detection algorithms may be robust to modeling assumptions, and that their usage is analogous to the usage of histograms in exploratory data analysis. The result also has connections to the recent literature on exchangeable graph models, graph limits, and graphons.
top of page

Semimartingale reflecting Brownian motions: tail asymptotics for stationary distributions


Jim Dai (Cornell University)
Multidimensional semimartingale reflecting Brownian motions (SRBMs) arise as the diffusion limits for stochastic networks. I will describe a powerful methodology to obtain the tail asymptotics of the stationary distribution of an SRBM. The methodology uses a moment generating function version of the basic adjoint relationship that characterizes the stationary distribution. The tail asymptotics can be used to predict quality of service in stochastic networks. It can also be used to speed up an algorithm, devised in Dai and Harrison (1992), to compute the stationary distribution.
This talk is based on joint works with Masakio Miyazawa and Jian Wu.
top of page

Sublinear Optimization


Elad Hazan (Technion)
In many modern optimization problems, specifically those arising in machine learning, the amount data is too large to apply standard convex optimization methods. We'll discuss new optimization algorithms that make use of randomization to prune the data produce a correct solution albeit running in time which is smaller than the data representation, i.e. sublinear running time. We'll present such sublineartime algorithms for linear classification, support vector machine training, semidefinite programming and other optimization problems. These new algorithms are based on a primaldual approach, and use a combination of novel sampling techniques and the randomized implementation of online learning algorithms. We'll describe informationtheoretic lower bounds that show our running times to be nearly best possible in the unitcost RAM model.
top of page

Transitory Queueing Systems


Rahul Jain (University of Southern California)
"Transitory" queueing systems, namely those that exist only for finite time are all around us.
And yet, current queueing theory provides with very few models and even fewer tools to understand them.
In this talk, I'll introduce a model that may be more appropriate in "transitory" queueing situations.
I'll start by first talking about the ``concert arrival game" that we introduced. This model captures the
tradeoff in decisionmaking when users must decide whether to arrive early and wait in a queue, or arrive
late but miss out on opportunities (e.g., best seats at a concert). With such decisionmaking by the users, it
is a folly to assume that the arrival process is exogenous, and a wellbehaved renewal process. Thus, under
the fluid model, we characterize the equilibrium arrival profile and characterize the ``price of anarchy" (PoA).
We also give some simple mechanisms to reduce it. We then present various extensions including to general
feedforward networks when routing itself is strategic.
The concert queueing game led us to the Delta(i)/GI/1 queue  a new model for transitory queueing systems,
in which a each user picks their time of arrival from a common distribution. We derive the fluid and diffusion process
limits for such a model. We find that the limiting queue length process is a reflection of a combination of a Brownian
motion and a Brownian bridge. The convergence is established in the M1 topology, and we also show that it is not
possible to obtain it in the stronger J1 topology. Such models allow us to study transitory queueing systems which
are not very amenable to extant techniques in queueing theory.
top of page

Which side chooses in large random matching markets?


Yashodhan Kanoria (MSR New England and Columbia University)
We analyze large random twosided matching markets with different numbers of men and women. We show that being on the short side of the market confers a large advantage, in a sense that the short side roughly chooses the matching while the long side roughly gets chosen. In addition, the matching is roughly the same under all stable matchings. Consider a market with n men and n+k women, for arbitrary 1 <= k<= n/2. We show that, with high probability, the men's average rank of wives is less than 5log(n/k) in any stable match. Further, men's average rank of wives in the womenoptimal stable match (WOSM) is within a factor 1+\epsilon of that in the menoptimal stable match (MOSM). On the other hand, the women's average rank of husbands is at least n/(5 log(n/k)) in all stable matches, and is again very nearly the same in the WOSM and MOSM. Thus, our results imply that the core is likely to be "small".
Simulations show that our the same features arise even in small markets.
Joint work with Itai Ashlagi and Jacob D. Leshno
top of page

The Art of Gambling in a Team: MultiPlayer MultiArmed Bandits


Rahul Jain (University of Southern California)
MultiArmed bandits are an elegant model of learning in an unknown and uncertain environment. Such models are relevant in many scenarios, and of late have received increased attention recently due to various problems of distributed control that have arisen in wireless networks, pricing models on the internet, etc. We consider a nonBayesian multiarmed bandit setting proposed by Lai and Robbins in mid80s. There are multiple arms each of which generates an i.i.d. reward from an unknown distribution. There are multiple players, each of whom is choosing which arm to play. If two or more players choose the same arm, they all get zero rewards. The problem is to design a learning algorithm to be used by the players that results in an orthogonal matching of players to arms (e.g., users to channels in wireless networks), and moreover minimizes the expected regret.
We first consider this as an online bipartite matching problem. We model this combinatorial problem as a classical multiarmed bandit problem but with dependent arms, and propose an indexbased learning algorithm that achieves logarithmic regret. From prior results, it is known that this is orderoptimal. We then consider the distributed problem where players do not communicate or coordinate in any way. We propose an indexbased algorithm that uses Bertsekas' auction mechanism to determine the bipartite matching. We show that the algorithm has expected regret at most nearlogsquared. This is the first distributed multiarmed bandit learning algorithm in a general setting.
top of page

TBD


Rahul Mazumder (MIT)
Lowrank matrix regularization is an important area of research in statistics and machine learning with a wide range of applications. For example, in the context of Missing data imputation arising in recommender systems (e.g. the Netflix Prize), DNA microarray and audio signal processing, among others, the object of interest is a matrix (X) for which the training data comprises of a few noisy linear measurements. The task is to estimate X, under a low rank constraint and possibly additional affine constraints on X. In practice, the matrix dimensions frequently range from hundreds of thousands to even a million  leading to severe computational challenges. In this talk, I will describe computationally tractable models and scalable (convex) optimization based algorithms for a class of lowrank regularized problems. Exploiting problemspecific statistical insights, problem structure and using novel tools for large scale SVD computations play important roles in this task.
This is joint work with Trevor Hastie, Rob Tibshirani and Dennis Sun (Stanford Statistics).
top of page

Optimal detection of a sparse principal component


Philippe Rigollet (Princeton University)
Sparse Principal Component Analysis (SPCA) is a remarkably useful tool for practitioners who had been relying on adhoc thresholding methods. Our analysis aims at providing a a framework to test whether the data at hand indeed contains a sparse principal component. More precisely we propose an optimal test procedure to detect the presence of a sparse principal component in a highdimensional covariance matrix. Our minimax optimal test is based on a sparse eigenvalue statistic. Alas, computing this test is known to be NPcomplete in general and we describe a computationally efficient alternative test using convex relaxations. Our relaxation is also proved to detect sparse principal components at near optimal detection levels and performs very well on simulated datasets. Moreover, we exhibit some evidence from average case complexity that this slight deterioration may be unavoidable.
top of page

Growth of random surfaces


Alexei Borodin (Massachusetts Institute of Technology)
The goal of the talk is to describe a class of solvable
random growth models of one and twodimensional interfaces. The growth
is local (distant parts of the interface grow independently), it has a
smoothing mechanism (fractal boundaries do not appear), and the speed
of growth depends on the local slope of the interface.

The models enjoy a rich algebraic structure that is reflected through
closed determinantal formulas for the correlation functions. Large
time asymptotic analysis of such formulas reveals asymptotic features
of the emerging interface in different scales. Macroscopically, a
deterministic limit shape phenomenon can be observed. Fluctuations
around the limit shape range from universal laws of Random Matrix
Theory to conformally invariant Gaussian processes in the plane. On
the microscopic (lattice) scale, certain universal determinantal
random point processes arise.

No preliminary knowledge of the material will be assumed.

top of page

The Coherence Phase Transition


Semen Shlosman (CNRS, France and Institute for Information Transmission Problems, Russia)
We study particle systems corresponding to highly connected queueing networks. We examine the validity of the socalled Poisson Hypothesis (PH), which predicts that the Markov process, describing the evolution of such particle system, started from a reasonable initial state, approaches the equilibrium in time independent of the size of the network.

This is indeed the case in many situations. However, there are networks for which the relaxation process slows down. This behavior reflects the fact that the corresponding infinite system undergoes a phase transition. It is characterized by the property that different nodes of the network start to evolve in a synchronous way.

Such transition can happen only when the load per node exceeds some critical value, while in the low load situation the PH behavior holds. The load thus plays here the same role as the inverse temperature in statistical mechanics.

We will mention a related open problem of ergodicity of interacting particle systems with unique stationary state.

The talk is based on joint works with A. Rybko and A. Vladimirov.

top of page

Meanfield Limit for General Queueing Networks on Infinite Graphs


Alexander Rybko (Institute for Information Transmission Problems, Russia)
We study the sequences of Markov processes defining the evolution of a general symmetric queueng network with the number of nodes tending to infinity. These sequences have limits which are nonlinear Markov processes. Time evolution of these nonlinear Markov processes are sometimes simpler and easier to analyze than the evolution of corresponding prelimit Markov processes on finite graphs. This fact give the possibility to find good approximation for the behavior of symmetric queueing networks on large graphs. Examples will be discussed.

The talk is based on a joint work with S. Shlosman.

top of page

Information theory of DNA sequencing


Guy Bresler (University of California, Berkeley)
DNA sequencing is the basic workhorse of modern biology and medicine. Shotgun sequencing is the dominant technique used: many randomly located short fragments called reads are extracted from the DNA sequence, and these reads are assembled to reconstruct the original DNA sequence. Despite major effort by the research community, the DNA sequencing problem remains unresolved from a practical perspective: it is currently not known how to produce a good assembly of most read datasets.

By drawing an analogy between the DNA sequencing problem and the classic communication problem, we define an information theoretic notion of sequencing capacity. This is the maximum number of DNA base pairs that can be resolved reliably per read, and provides a fundamental limit to the performance that can be achieved by any assembly algorithm. We compute the sequencing capacity explicitly assuming an IID model for the DNA sequence and a noiseless read process. These basic results are then partially extended to arbitrary DNA sequences: the capacity is determined by a simple sufficient statistic of the sequence which can be computed for actual genomes.

The talk is based on joint works with Ma'ayan Bresler, Abolfazl Motahari, and David Tse.

Speaker Bio: Guy Bresler is currently a PhD candidate in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. Prior to that, he received the B.S. degree in electrical and computer engineering and the M.S. degree in mathematics from the University of Illinois at UrbanaChampaign, both in 2006. Guy is the recipient of an NSF Graduate Research Fellowship, a Vodafone Graduate Fellowship, the Barry M. Goldwater Scholarship, a Vodafone Undergraduate Scholarship, the E.C. Jordan Award from the ECE department at UIUC, and the Roberto Padovani Scholarship from Qualcomm. His research interests include information theory, applied probability, and computational biology.

top of page

Research Groups at Yandex and Moscow Institute of Physics and Technology


Andrei Raigorodksii (Yandex and Moscow Institute of Physics and Technology)
In my talk I will present our research groups in Yandex which is the main search engine in Russia, and Moscow Institute of Physics and Technology which is a kind of Russian MIT.

top of page

Web Graph Models and Their Applications


Andrei Raigorodksii (Yandex and Moscow Institute of Physics and Technology)
In the past 20 years, there has been a lot of work concerning modeling real networks such as Internet, WWW, social networks, biological networks, etc. In our talk, we will mainly concentrate on models devoted to incorporating properties of the web graph. We will give a survey of different such models including those of Bollob\'asRiordan, BuckleyOsthus, CooperFrieze, etc. We will also present some new related mathematical results obtained by our group in Yandex, Moscow Institute of Physics and Technology, Moscow State University. Finally, we will discuss some applications of these results to ranking as well as some future research directions.

top of page

Conditional Coding with Limiting Computational Resources


Daniil Musatov (Yandex and Moscow Institute of Physics and Technology)
Consider the following situation. Alice knows string x and Bob knows string y. There is a limited capacity information channel from Alice to Bob. Alice wants to transmit a message that is sufficient for Bob to recover x. What is a minimum capacity that allows her to do it? Obviously, there is a theoretical minimum: the amount of information in x that is not contained in y. Astonishing enough, this is a precise estimate: Alice can produce an optimal code even though she does not know what information Bob already possesses. In Shannon entropy framework this fact is stated as SlepianWolf theorem. In Kolmogorov complexity framework (where both Alice and Bob use computable functions and logarithmic advice) it was proven by Muchnik. In the talk we will give another proof of Muchnik's result and extend it to spacebounded case, where Alice and Bob use spacebounded computations and advice is still logarithmic.

top of page

An Application of Talagrand's Inequality to Prove a Concentration of Second Degrees in BuckleyOsthus Random Graph Model


Liudmila Ostroumova (Yandex and Moscow Institute of Physics and Technology)
We consider the random graph model $H_{a,m}^n$. An explicit construction of this model was suggested by Buckley and Osthus. For any fixed positive integer $m$ and positive real $a$ we define a random graph $H_{a,m}^n$ in the following way. The graph $H_{a,1}^1$ consists of one vertex and one loop. Given a graph $H_{a,1}^{t1}$ we transform it into $H_{a,1}^{t}$ by adding one new vertex and one new random edge $(t,i)$, where $i \in \{1, \dots, t \}$ and $\Prob(i=t) = \frac{a}{(a+1)t1}$, $\Prob(i=s) = \frac{d_{H_{a,1}^{t1}}(s)  1 + a}{(a+1)t1}$ if $1 \le s \le t1$. To construct $H_{a,m}^n$ with $m>1$ we start from $H_{a,1}^{mn}$. Then we identify the vertices $1, \dots, m$ to form the first vertex; we identify the vertices $m+1, \dots, 2m$ to form the second vertex, etc.

The Buckley and Osthus random graph is a model of realworld networks. The parameter $a$ is called an ``initial attractiveness'' of a vertex. It was shown (by Buckley and Osthus for integer $a$ and by Grechnikov for real $a$) that the degree sequence in this model follows a power law distribution with exponent $2a$. We study second degrees of vertices in $H_{a,m}^n$. By the second degree of $t$ we mean the number of edges adjacent to the neighbors of $t$ except for the edges adjacent to the vertex $t$. Second degree of a vertex is an approximation of the PageRank. We show that the distribution of second degrees in $H_{a,1}^n$ follows a power law with exponent $1a$.

Proof of this result consists of two parts. The first step is to calculate the expectation of the number of vertices with second degree $k$ and the second step is to obtain a concentration. I will talk about the second step. It is a nontrivial application of Talagrand's inequality. Instead of Talagrand's inequality it is possible to apply Azuma's inequality, but the result will be weaker.

top of page

Van der Warden Number and Coloring of Hypergraphs with Large Girth


Dmitry Shabanov (Yandex and Moscow Institute of Physics and Technology)
WThe talk is devoted to the classical problem of estimating the Van der Waerden number $W(n,r)$. The famous Van der Waerden theorem states that, for any integers $n\ge 3$, $r\ge 2$, there exists the smallest integer $W(n,r)$ such that, for every $N\ge W(n,r)$, in any $r$coloring of the set $\{1,2,\ldots,N\}$ there exists a monochromatic arithmetic progression of length $n$. The best upper bound for $W(n,r)$ in the general case was obtained by W.T. Gowers in 2001 who proved that $$ W(n,r)\le 2^{2^{r^{2^{2^{n+9}}}}}.$$ Our talk is concentrated on the lower bounds for the Van der Waerden number. We shall show that lower estimates for $W(n,r)$ are closely connected with extremal problems concerning colorings of uniform hypergraphs with large girth. Our main result is a new lower bound for $W(n,r)$: $$W(n,r)\ge c\,\frac {r^{n1}}{\ln n},$$ where $c>0$ is an absolute constant. The proof is based on a continuoustime random recoloring process.

top of page

Delay optimality in switched networks
(paper) 

Yuan Zhong (MIT ORC)
We consider a switched (queueing) network in which there are constraints
on which queues may be served simultaneously; such networks have been
used to effectively model inputqueued switches and wireless networks.
The scheduling policy for such a network specifies which queues to serve at any
point in time, based on the current state or past history of the system.
In the main result, we provide a new class of
online scheduling policies that achieve optimal average queuesize scaling
for a class of switched networks including inputqueued switches.
In particular, it establishes the validity of a conjecture about optimal
queuesize scaling for inputqueued switches.

Speaker Bio: Yuan Zhong is a doctoral candidate in the Operations Research Center at MIT, under
the supervision of Devavrat Shah and John Tsitsiklis. He has a BA degree in Mathematics
from University of Cambridge, and a MA degree in Mathematics from California Institute of Technology.
He is broadly interested in stochastic modeling and analysis, as well as their applications in various
business and engineering domains.
top of page

Does god play dice? An I/O scheduling and airplane boarding perspective.


Eitan Bachmat (BenGurion University)
Given N I/O requests to a disk drive, how long will it take the
disk drive to service them? How long does it take to board an airplane? We
will show that such questions are modeled by the same math that models the
universe in relativity theory, namely, spacetime (Lorentzian) geometry.As a
result of this connection we will try to understand what is the best way to
service I/O, what is the best airplane boarding policy and what makes free
falling bodies take the trajectories that they do. On the way we may also
encounter a silly card game and random matrices. The talk is self
contained, experience with boarding an airplane is helpful.

Speaker Bio: Eitan Bachmat is a senior lecturer at the Computer Science department
of BenGurion U. He received his PhD in mathematics from M.I.T. He works
mostly on the design of storage systems, performance analysis and applied
probability. He also consults to the storage industry and participated in
the development of several commercially available systems. Consequently, he
has more patents than academic papers.
top of page

Asymptotics for preferential attachment random graphs via Stein's method
(paper) 

Erol Peköz (Boston University)
Preferential attachment random graphs evolve in time by
sequentially adding vertices and edges randomly so that
connections to vertices having high degree are favored.
There has been an explosion of research on these models
since Barabasi and Albert popularized them in 1999 to explain
the socalled power law behavior observed in real networks
such as the graph of the Internet
where Web pages correspond to vertices and hyperlinks between them correspond to edges.
In this talk we will show how to derive rates of convergence for the distribution of the degree
of a fixed vertex to its distributional limit using a new variation of Stein's method
that relies on finding couplings to motivate distributional transformations
for which limit distributions are fixed points.
Surprisingly little is known about these limit distributions
despite the large literature on such graphs.
top of page

On Resolution, Sparse Signal Recovery, and Random Access Communication
(abstract) 

Vivek Goyal (MIT EECS)
Resolution of a data acquisition system is usually thought to be
determined completely by sensing properties, such as density and
accuracy of measurements. This talk advocates the view that resolution
is, in addition, dependent on signal modeling and the complexity of
computations allowed. This view is developed concretely for the
acquisition of sparse signals, where the asymptotic relationship
between resolution and the number of measurements is studied for
algorithms with various complexities. The sparse signal recovery
problem corresponds perfectly to a model of random multiple access
communication in which the task of the receiver is to determine only
the subset of the users that transmitted. This connection gives
insight on the performance of single and multiuser detection
algorithms. Specifically, all known practical multiuser detection
techniques become interference limited at high SNR. However, power
control is natural in the multiple access setting, and we are able to
prove that a simple detection algorithm based on orthogonal matching
pursuit is not interference limited under optimal power control. The
talk is based on joint work with Alyson Fletcher and Sundeep Rangan.
top of page

The Smoothed Linear Program for Approximate Dynamic Programming
(abstract) 

Vivek F. Farias (MIT Sloan)
We present a novel linear program for the approximation of the dynamic
programming value function in highdimensional stochastic control problems. LP
approaches to approximate DP naturally restrict attention to approximations
that, depending on the context, are upper or lower bounds to the optimal value
function. Our program  the `smoothed LP'  relaxes this restriction in an
appropriate fashion while remaining computationally tractable. Doing so
appears to have several advantages:
o We demonstrate superior bounds on the quality of approximation to the optimal
value function afforded by our approach.
o Experiments with our approach on a challenging problem (the game of Tetris)
show that the approach outperforms a variety of approximate DP algorithms
(including the LP approach, TDlearning and policy gradient methods) by a
significant margin.
Joint work with Vijay Desai and Ciamac Moallemi (Columbia).
top of page

Fractional simple random walk
(abstract) 

Scott Sheffield (MIT Math)
Fractional Brownian motions are the most natural generalizations of
ordinary (onedimensional) Brownian motions that allow for some amount
of long range dependence (a.k.a. "momentum effects"). They are
sometimes used in mathematical finance as models for logarithmic asset
prices. We describe some natural random simple walks on the integers
that have fractional Brownian motion as a scaling limit. In a sense,
these walks are the most natural discrete analogs of fractional
Brownian motion.
top of page

A New Look at the Compound Poisson Distribution and Compound
Poisson Approximation using Entropy
(abstract) 

Mokshay Madiman (Yale University)
We develop an informationtheoretic foundation for compound Poisson
approximation and limit
theorems (analogous to the corresponding developments for the central
limit theorem and for
simple Poisson approximation). First, su?cient conditions are given
under which the compound
Poisson distribution has maximal entropy within a natural class of
probability measures on the
nonnegative integers. In particular, it is shown that a maximum
entropy property is valid if the
measures under consideration are logconcave, but that it fails in
general. Second, approximation
bounds in the (strong) relative entropy sense are given for
distributional approximation of sums
of independent nonnegative integer valued random variables by compound
Poisson distributions.
The proof techniques involve the use of a notion of local information
quantities that generalize the
classical Fisher information used for normal approximation, as well as
the use of ingredients from
Stein's method for compound Poisson approximation.
This work is joint with Andrew Barbour (Zurich), Oliver Johnson
(Bristol) and Ioannis Kontoyiannis (AUEB).
top of page

Sequential algorithms for generating random graphs
(abstract) 

Mohsen Bayati (Microsoft Research New England)
Largescale complex networks have been the objects of study for the past two
decades, and one central problem have been constructing or designing realistic
models for such networks. This problem appears in a variety of applications
including coding theory and systems biology. Unfortunately, the existing
algorithms for this problem have large running times, making them impractical
to use for networks with millions of links.
We will present a technique to design an almost linear time algorithm for
generating random graphs with given degree sequences for a range of degrees.
Next, we extend the analysis to design an algorithm for efficiently generating
random graphs without short cycles. These algorithms can be used for
constructing high performance lowdensity paritycheck codes (LDPC codes).
Based on joint works with Jeong Han Kim, Andrea Montanari, and Amin Saberi.
top of page

Convergence of unitary matrix integrals
(abstract) 

Benoît Collins (University of Ottawa)
We introduce the unitary SchwingerDyson equation associated to a
selfadjoint polynomial potential V. The V=0 case corresponds to the free product
state, so the SchwingerDyson equation can be considered as a deformation
of free probability. We show that the solutions of this equation
are unique for small V's and correspond to a large N limit of a
multimatrix model. This technique allows to show that a wide class of
unitary and orthogonal multimatrix models converge asymptotically.
We also give a combinatorial and analytic description of the limit, solving
a series of open questions raised in theoretical physics in the late 70's.
This is joint work with A. Guionnet and E. MaurelSegala.
top of page

The Gaussian random walk, sampling Brownian motion, and the Riemann zeta function
(abstract) 

Johan van Leeuwaarden (Eindhoven University of Technology, EURANDOM,
NYU Stern)
We consider the Gaussian random walk (onedimensional random walk with
normally distributed increments), and in particular the moments of its maximum
M. Explicit expressions for all moments of M are derived in terms of Taylor
series with coefficients that involve the Riemann zeta function. We build upon
the work of Chang and Peres (1997) on P(M=0) and Bateman's formulas on Lerch's
transcendent. Our result for E(M) completes earlier work of Kingman (1965),
Siegmund (1985), and Chang and Peres (1997).
The maximum M shows up in a range of applications, such as sequentially testing
for the drift of a Brownian motion, corrected diffusion approximations,
simulation of Brownian motion, option pricing, thermodynamics of a polymer
chain, and queueing systems in heavy traffic. Some of these applications are
discussed, as well as several issues open for further research. This talk is
based on joint work with A.J.E.M. Janssen.
top of page

Quantum Computation by Adiabatic Evolution
(abstract) 

Ed Farhi (MIT Physics)
The quantum adiabatic algorithm is a general approach to solving combinatorial
search problems using a quantum computer. It will succeed if the run time is
long enough. I will discuss how this algorithm works and our current
understanding of the required run time.
top of page

Large deviations for random walks under subexponentiality: the
bigjump domain
(abstract)(slides) 

Ton Dieker (Georgia Tech I&SE)
Stimulated by applications to internet traffic modeling and insurance
mathematics, distributions with heavy tails have been widely studied
over the past decades. This talk addresses a fundamental
largedeviation problem for random walks with heavytailed stepsize
distributions. We consider socalled subexponential stepsize
distributions, which constitute the most widelyused class of
heavytailed distributions. I will present a theory to study sequences
{x_n} for which P{S_n>x_n} behaves asymptotically like n P {S_1>x_n}
for large n.
(joint work with D. Denisov and V. Shneer)
top of page

Bounds on Stationary Expectations for Markov Processes
(abstract) 

Peter Glynn (Stanford MS&E)
Many performance engineering and operations research modeling
formulations lead to Markov models in which the key performance measure
is an expectation defined in terms of the stationary distribution of the
process. In models of realistic complexity, it is often difficult to
compute such expectations in closed form. In this talk, we will discuss
a simple class of bounds for such stationary expectations, and describe
some of the mathematical subtleties that arise in making rigorous such
bounds. We will also discuss how such bounds can be used
algorithmically. This work is joint with Denis Saure and Assaf Zeevi.
top of page

Fragility of Asymptotic Agreement under Bayesian Learning
(abstract)(slides) 

Daron Acemoglu (MIT Econ)
Under the assumption that individuals know the conditional distributions of
signals given the payoffrelevant parameters, existing results conclude that,
as individuals observe infinitely many signals, their beliefs about the
parameters will eventually merge. We first show that these results are fragile
when individuals are uncertain about the signal distributions: given any such
model, a vanishingly small individual uncertainty about the signal
distributions can lead to a substantial (nonvanishing) amount of differences
between the asymptotic beliefs. We then characterize the conditions under which
a small amount of uncertainty leads only to a small amount of asymptotic
disagreement. According to our characterization, this is the case if the
uncertainty about the signal distributions is generated by a family with
"rapidlyvarying tails" (such as the normal or the exponential distributions).
However, when this family has "regularlyvarying tails" (such as the Pareto,
the lognormal, and the tdistributions), a small amount of uncertainty leads
to a substantial amount of asymptotic disagreement.
top of page

Quantile and Probability Curves without Crossing
(abstract)(slides) 

Victor Chernozhukov (MIT Econ)
The most common approach to estimating conditional quantile curves is to fit a curve, typically linear, pointwise for each quantile. Linear functional forms, coupled with pointwise fitting, are used for a number of reasons including parsimony of the resulting approximations and good computational properties. The resulting fits, however, may not respect a logical monotonicity requirement  that the quantile curve be increasing as a function of probability. This paper studies the natural monotonization of these empirical curves induced by sampling from the estimated nonmonotone model, and then taking the resulting conditional quantile curves that by construction are monotone in the probability. This construction of monotone quantile curves may be seen as a bootstrap and also as a monotonic rearrangement of the original nonmonotone function. It is shown that the monotonized curves are closer to the true curves in finite samples, for any sample size. Under correct specification, the rearranged conditional quantile curves have the same asymptotic distribution as the original nonmonotone curves. Under misspecification, however, the asymptotics of the rearranged curves may partially differ from the asymptotics of the original nonmonotone curves. An analogous procedure is developed to monotonize the estimates of conditional distribution functions. The results are derived by establishing the compact (Hadamard) differentiability of the monotonized quantile and probability curves with respect to the original curves in discontinuous directions, tangentially to a set of continuous functions. In doing so, the compact differentiability of the rearrangementrelated operators is established.
top of page

Exponential Error Bounds for Random Codes on the BSC
(abstract) 

David Forney (MIT LIDS)
One of Shannon's earliest results was his determination of the capacity of the binary symmetric channel (BSC). Shannon went on to show that, with randomly chosen codes and optimal decoding, the probability of decoding error decreases exponentially for any transmission rate less than capacity. Much of the important early work of Shannon, Elias, Fano and Gallager was devoted to determining bounds on the corresponding "error exponent."
A later approach to this problem, pioneered by Csiszar and Korner, and now adopted in many modern information theory courses such as 6.441, determines error exponents using largedeviation theory. This approach has several advantages:
(a) It can be remarkably simple and intuitive;
(b) It gives correct error exponents, not just bounds;
(c) It gives insight into how decoding errors typically occur.
We illustrate this approach by developing the correct error exponents at all rates for both random codes and random linear codes on the BSC, assuming optimal decoding. We discuss how decoding errors typically occur. In particular, we show why the classical algebraic coding approach never had any hope of approaching channel capacity, even on the BSC.
This talk is intended to be pitched at an elementary and tutorial level.
top of page

