MIT Stochastics and Statistics Seminar

 Organizers: David Gamarnik Devavrat Shah Victor Chernozhukov
 ORC LIDS Statistics@MIT Maps & Directions

# Fall 2014

Aug 15th Florent Krzakala (Université Pierre et Marie) E62-587, 2-3 Superposition codes and approximate-message-passing decoder Lenka Zdeborova (CEA) E62-587, 3:15-4:15 Clustering of sparse networks: Phase transitions and optimal algorithms

# Spring 2014

May 30th Nathan Kallus (MIT) E62-587, 11-12 Regression-Robust Designs of Controlled Experiments Alex Belloni (Duke University) E62-587, 11-12 Uniform Post Selection Inference for Z-estimation problems Antar Bandyopadhyay (University of California, Berkeley) E62-587, 11-12 De-Preferential Attachment Random Graphs Sahand Negahban (Yale University) E62-587, 2-3 Computationally and Statistically Efficient Estimation in High-Dimensions Joel Spencer (Courant Institute, New York University) E62-587, 11-12 Avoiding Outliers Sébastien Bubeck (Princeton University) E62-587, 11-12 On the influence of the seed graph in the preferential attachment model Alexander Rakhlin (University of Pennsylvania, The Wharton School) E62-587, 11-12 Learning and estimation: separated at birth, reunited at last Alexandre Tsybakov (CREST-ENSAE) E62-587, 11-12 Linear and Conic Programming Approaches to High-Dimensional Errors-in-variables Models Karthekeyan Chandrasekaran (Harvard) E62-587, 11-12 Integer Feasibility of Random Polytopes Michael Brautbar (MIT) E62-587, 11-12 On the Power of Adversarial Infections in Networks

# Fall 2013

Dec 13th Nelly Litvak (University of Twente) E62-587, 11-12 Degree-degree dependencies in random graphs with heavy-tailed degrees Ramon van Handel (Princeton University) E62-587, 11-12 Conditional Phenomena in Time and Space Lie Wang (MIT) E62-587, 4-5 Multivariate Regression with Calibration David Choi (Heinz College, Carnegie Mellon University) E62-587, 11-12 Consistency of Co-clustering exchangeable graph data Jim Dai (Cornell University) E62-587, 11-12 Semimartingale reflecting Brownian motions: tail asymptotics for stationary distributions Elad Hazan (Technion) Sublinear Optimization

# Spring 2013

May 3rd Rahul Jain (University of Southern California) Transitory Queueing Systems Yashodhan Kanoria (MSR New England and Columbia University) Which side chooses in large random matching markets? Rahul Jain (University of Southern California) The Art of Gambling in a Team: Multi-Player Multi-Armed Bandits

# Fall 2012

November 30th Rahul Mazumder (MIT) Low-rank Matrix Completion: Statistical Models and Large Scale Algorithms Kuang Xu (MIT) Queueing system topologies with limited flexibility Shankar Bhamidi (University of North Carolina) Cancelled Philippe Rigollet (Princeton University) Optimal detection of a sparse principal component

# Spring 2012

May 18th Alexei Borodin (Massachusetts Institute of Technology) Growth of random surfaces Semen Shlosman (CNRS, France and Institute for Information Transmission Problems, Russia) The Coherence Phase Transition Alexander Rybko (Institute for Information Transmission Problems, Russia) Mean-field Limit for General Queueing Networks on Infinite Graphs Guy Bresler (University of California, Berkeley) Information theory of DNA sequencing Andrei Raigorodksii (Yandex and Moscow Institute of Physics and Technology) Research Groups at Yandex and Moscow Institute of Physics and Technology Andrei Raigorodksii (Yandex and Moscow Institute of Physics and Technology) Web Graph Models and Their Applications Daniil Musatov (Yandex and Moscow Institute of Physics and Technology) Conditional Coding with Limiting Computational Resources Liudmila Ostroumova (Yandex and Moscow Institute of Physics and Technology) An Application of Talagrand's Inequality to Prove a Concentration of Second Degrees in Buckley-Osthus Random Graph Model Dmitry Shabanov (Yandex and Moscow Institute of Physics and Technology) Van der Warden Number and Coloring of Hypergraphs with Large Girth

# Fall 2011

Dec 16th Yuan Zhong (MIT ORC) Delay optimality in switched networks Eitan Bachmat (Ben-Gurion University) Does god play dice? An I/O scheduling and airplane boarding perspective. Erol Peköz (Boston University) Asymptotics for preferential attachment random graphs via Stein's method

# Spring 2009

May 1th Vivek Goyal (MIT EECS) On Resolution, Sparse Signal Recovery, and Random Access Communication Vivek F. Farias (MIT Sloan) The Smoothed Linear Program for Approximate Dynamic Programming Scott Sheffield (MIT Math) Fractional simple random walk Mokshay Madiman (Yale University) A New Look at the Compound Poisson Distribution and Compound Poisson Approximation using Entropy Mohsen Bayati (Microsoft Research New England) Sequential algorithms for generating random graphs

# Fall 2008

Sep 23rd Benoît Collins (University of Ottawa) Convergence of unitary matrix integrals

# Spring 2008

May 23rd Johan van Leeuwaarden (Eindhoven University of Technology, EURANDOM, NYU) The Gaussian random walk, sampling Brownian motion, and the Riemann zeta function Edward Farhi (MIT Physics) Quantum Computation by Adiabatic Evolution Ton Dieker (Georgia Tech I&SE) Large deviations for random walks under subexponentiality: the big-jump domain Peter Glynn (Stanford MS&E) Bounds on Stationary Expectations for Markov Processes Daron Acemoglu (MIT Econ) Fragility of Asymptotic Agreement under Bayesian Learning Victor Chernozhukov (MIT Econ & ORC) Quantile and Probability Curves without Crossing

# Fall 2007

Nov 16th David Forney (MIT LIDS) Exponential Error Bounds for Random Codes on the BSC

# Abstracts

 Superposition codes and approximate-message-passing decoder Florent Krzakala (Université Pierre et Marie) Superposition codes are asymptotically capacity achieving scheme for the Additive White Gaussian Noise channel. I will first show how a practical iterative decoder can be built based on a Belief Propagation type approach, closely related to the one performed in compressed sensing and sparse estimation problems. Secondly, I will show how the idea of spatial coupling in this context allows to built efficient and practical capacity achieving coding and decoding schemes. The links between the present problem, sparse estimations, and non-parametric belief propagation for continuous variables will be also discussed. References: Replica Analysis and Approximate Message Passing Decoder for Superposition Codes, J. Barbier and F. Krzakala, ISIT 2014. Statistical physics-based reconstruction in compressed sensing, F. Krzakala et al. Phys. Rev. X 2, 021005 (2012) top of page Clustering of sparse networks: Phase transitions and optimal algorithms Lenka Zdeborova (CEA) A central problem in analyzing networks is partitioning them into modules or communities, clusters with a statistically homogeneous pattern of links to each other or to the rest of the network. A principled approach to address this problem is to fit the network on a stochastic block model, this task is, however, intractable exactly. In this talk we discuss application of belief propagation algorithm to module detection. In the first part we present an asymptotically exact analysis of the stochastic block model. We quantify properties of the detectability/undetectability phase transition and the easy/hard phase transition for the module detection problem. In the second part we introduce a new class of spectral algorithms based on a non-backtracking walk on the directed edges of the graph. We show that our algorithm is optimal for graphs generated by the stochastic block model, detecting communities all the way down to the theoretical limit. We also show the spectrum of the non-backtracking operator for some real-world networks, illustrating its advantages over traditional spectral clustering. References: Phase transition in the detection of modules in sparse networks; Aurelien Decelle, Florent Krzakala, Cristopher Moore, Lenka Zdeborová; Phys. Rev. Lett. 107, 065701 (2011). Spectral redemption: clustering sparse networks; Florent Krzakala, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborová, Pan Zhang; Proceedings of the National Academy of Sciences 110, no. 52 (2013): 20935-20940 top of page Regression-Robust Designs of Controlled Experiments Nathan Kallus (MIT) Achieving balance between experimental groups is a cornerstone of causal inference. Without balance any observed difference may be attributed to a difference other than the treatment alone. In controlled/clinical trials, where the experimenter controls the administration of treatment, complete randomization of subjects has been the golden standard for achieving this balance because it allows for unbiased and consistent estimation and inference in the absence of any a priori knowledge or measurements. However, since estimator variance under complete randomization may be slow to converge, experimental designs that balance pre-treatment measurements (baseline covariates) are in pervasive use, including randomized block designs, pairwise-matched designs, and re-randomization. We formally argue that absolutely no balance better than complete randomization's can be achieved without partial structural knowledge about the treatment effects. Therefore, that balancing designs are in popular use and are advocated means that partial knowledge is in fact available to the researcher, just as one would expect from a researcher conducting an experiment in her own domain of expertise. We propose a novel framework for formulating such knowledge using functional analysis that subsumes all of the aforementioned designs in that it recovers them as optimal under certain modeling choices, thus theoretically characterizing their comparative power under different assumptions and suggesting extensions of these to multi-arm trials. Furthermore, it suggests new optimal designs that are based on more robust nonparametric modeling and that offer extensive gains in precision and power. In certain cases we are able to argue linear convergence O(1/2^n) to the sample average treatment effect (as compared to the usual logarithmic convergence O(1/n)). We characterize generally the unbiasedness, variance, and consistency of resulting estimators; solve the design problem; and develop appropriate inferential algorithms for Fisherian and Neymanian hypotheses. We uncover connections to Bayesian experimental design and make extensions to dealing with non-compliance. top of page Uniform Post Selection Inference for Z-estimation problems Alex Belloni (Duke University) In this talk we will consider inference with high dimensional data. We propose new methods for estimating and constructing confidence regions for a regression parameter of primary interest alpha_0, a parameter in front of the regressor of interest, such as the treatment variable or a policy variable. We show how to apply these methods to Z-estimators (for example, logistic regression and quantile regression). These methods allow to estimate alpha_0 at the root-n rate when the total number p of other regressors, called controls, exceed the sample size n, using the sparsity assumptions. The sparsity assumption means that only s unknown controls are needed to accurately approximate the nuisance part of the regression function, where s is smaller than n. Importantly, the estimators and these resulting confidence regions are honest" in the formal sense that their properties hold uniformly over s-sparse models. Moreover, these procedures do not rely on traditional consistent model selection" arguments for their validity; in fact, they are robust with respect to moderate" model selection mistakes in variable selection steps. Furthermore, the estimators are semi-parametrically efficient in the sense of attaining the semi-parametric efficiency bounds for the class of models in this paper. top of page De-Preferential Attachment Random Graphs Antar Bandyopadhyay (University of California, Berkeley) In this talk we will introduce a new model of a growing sequence of random graphs where a new vertex is less likely to join to an existing vertex with high degree and more likely to join to a vertex with low degree. In contrast to the well studied model of preferential attachment random graphs where higher degree vertices are preferred, we will call our model de-preferential attachment random graph model. We will consider two types of de-preferential attachment models, namely, inverse de-preferential, where the attachment probabilities are inversely proportional to the degree and linear de-preferential, where the attachment probabilities are proportional to $$c$$-degree, where $$c > 0$$ is a constant. We will give asymptotic degree distribution for both the models and show that the limiting degree distribution has very thin tail. We will also show that for a fixed vertex $$v$$, the degree grows as $$\sqrt{\log n}$$   for the inverse de-preferential case and as $$\log n$$ for the linear case. Some of the results will also be generalized when each new vertex joins to $$m > 1$$ existing vertices. This is a joint work with Subhabrata Sen, Stanford University. top of page Computationally and Statistically Efficient Estimation in High-Dimensions Sahand Negahban (Yale University) Modern techniques in data accumulation and sensing have led to an explosion in both the volume and variety of data. Many of the resulting estimation problems are high-dimensional, meaning that the number of parameters to estimate can be far greater than the number of examples. A major focus of my work has been developing an understanding of how hidden low-complexity structure in large datasets can be used to develop computationally efficient estimation methods. I will discuss a framework for establishing the error behavior of a broad class of estimators under high-dimensional scaling. I will then discuss how to compute these estimates and draw connections between the statistical and computational properties of our methods. Interestingly, the same tools used to establish good high-dimensional estimation performance have a direct impact for optimization: better conditioned statistical problems lead to more efficient computational methods. top of page Avoiding Outliers Joel Spencer (Courant Institute, New York University) Given $$n$$ vectors $$r_j$$ in $$n$$-space with all coefficients in $$[-1,+1]$$ one wants a vector $$x=(x_1,...,x_n)$$ with all $$x_i=+1$$ or $$-1$$ so that all dot products $$x\cdot r_j$$ are at most $$K\sqrt{n}$$ in absolute value, $$K$$ an absolute constant. A random $$x$$ would make $$x\cdot r_j$$ roughly Gaussian but there would be outliers. The existence of such an $$x$$ was first shown by the speaker, resolving a discrepancy question of Paul Erdős. However, the original argument did not give an effective algorithm. The first algorithm was found by Nikhil Bansal. The approach here, due to Lovett and Meka, is to begin with $$x=(0,...,0)$$ and let it float in a kind of restricted Brownian Motion until all the coordinates hit the boundary. There is some nice probability using multidimensional Gaussians, together with a martingale result on a “Gaussian Casino.” Paul Erdős was an extraordinary twentieth century mathematician with a unique life style and the speaker will include personal reminiscences of him. top of page On the influence of the seed graph in the preferential attachment model Sébastien Bubeck (Princeton University) We are interested in the following question: suppose we generate a large graph according to the linear preferential attachment model---can we say anything about the initial (seed) graph? A precise answer to this question could lead to new insights for the diverse applications of the preferential attachment model. In this work we focus on the case of trees grown according to the preferential attachment model. We first show that the seed has no effect from a weak local limit point of view. On the other hand, we conjecture that different seeds lead to different distributions of limiting trees from a total variation point of view. We take some steps in proving this conjecture by focusing on star seeds. top of page Learning and estimation: separated at birth, reunited at last Alexander Rakhlin (University of Pennsylvania, The Wharton School) Abstract: We consider the problem of regression in three scenarios: (a) random design under the assumption that the model F is correctly specified, (b) distribution-free statistical learning with respect to a reference class F; and (c) online regression with no assumption on the generative process. The first problem is often studied in the literature on nonparametric estimation, the second falls within the purview of statistical learning theory, and the third is studied within the online learning community. It is recognized that complexity of the class F plays the key role in determining the minimax behavior: the importance of entropy in the study of estimation goes back to Le Cam, Ibragimov and Khas'minskii, and Birge; within the setting of statistical learning the importance of entropy was established in the work of Vapnik and Chervonenkis and in subsequent works on uniform LLN within empirical process theory. The corresponding complexities for online learning have only been found in the past few years. But do these three problems really differ from the minimax point of view? The question, which boils down to understanding well-specified and misspecified models, will be addressed in this talk. Joint work with Karthik Sridharan and Sasha Tsybakov top of page Linear and Conic Programming Approaches to High-Dimensional Errors-in-variables Models Alexandre Tsybakov (CREST-ENSAE) We consider the regression model with observation error in the design when the dimension can be much larger than the sample size and the true parameter is sparse. We propose two new estimators, based on linear and conic programming, and we prove that they satisfy oracle inequalities similar to those for the model with exactly known covariates. The only difference is that they contain additional scaling with the $$l_1$$ or $$l_2$$ norm of the true parameter. The scaling with the $$l_2$$ norm is minimax optimal and it is achieved on conic programming, while the scaling with the $$l_1$$ norm is achieved on the linear programming estimator, which is easier to implement. This is a joint work with Mathieu Rosenbaum. top of page Integer Feasibility of Random Polytopes Karthekeyan Chandrasekaran (Harvard) Optimization problems are ubiquitous in contemporary engineering. A principal barrier to solving several real-world optimization problems is input uncertainty. In this talk, I will present new tools to study probabilistic instances of integer programs. As an application, I will show a phase-transition phenomenon in a simple distribution model for random integer programs. Our main tool is an elementary connection between integer programming and matrix discrepancy. I will describe this connection and derive matching upper and lower bounds on the discrepancy of random Gaussian matrices. Based on joint work with Santosh Vempala. top of page On the Power of Adversarial Infections in Networks Michael Brautbar (MIT) Over the last decade we have witnessed the rapid proliferation of online networks and Internet activity. Such activity is considered as a blessing but it brings with it a large increase in risk of computer malware --- malignant software that actively spreads from one computer to another. To date, the majority of existing models of malware spread use stochastic behavior, where the set of neighbors infected from the current set of infected nodes is chosen obliviously. In this work, we initiate the study of adversarial infection strategies which can decide intelligently which neighbors of infected nodes to infect next in order to maximize their spread, while maintaining a similar signature'' as the oblivious stochastic infection strategy as not to be discovered. We first establish that computing an optimal and near-optimal adversarial strategies is computational hard. We then identify necessary and sufficient conditions in terms of network structure and edge infection probabilities such that the adversarial process can infect polynomially more nodes than the stochastic process while maintaining a similar signature'' as the oblivious stochastic infection strategy. Among our results is a surprising connection between an additional structural quantity of interest in a network, the network toughness, and adversarial infections. Based on the network toughness, we characterize networks where existence of adversarial strategies that are pandemic (infect all nodes) is guaranteed, as well as efficiently computable. Based on joint work with Moez Draief and Sanjeev Khanna. top of page Degree-degree dependencies in random graphs with heavy-tailed degrees Nelly Litvak (University of Twente) Mixing patterns in large self-organizing networks, such as the Internet, the World Wide Web, social and biological networks are often characterized by degree-degree dependencies between neighboring nodes. In assortative networks the degree-degree dependencies are positive (nodes with similar degrees tend to connect to each other), while in disassortative networks these dependencies are negative. One of the problems with the commonly used Pearson correlation coefficient, also known as the assortativity coefficient, is that its magnitude decreases with the network size in disassortative networks. This makes it impossible to compare mixing patterns, for example, in two web crawls of different sizes. In this talk we analytically investigate degree-degree dependencies for scale-free graph sequences. In particular, we show that the Pearson's correlation coefficient is non-negative in the large graph limit when the asymptotic degree distribution has an infinite third moment. Furthermore, we provide examples where the Pearson's correlation coefficient converges to zero in a network with strong negative degree-degree dependencies, and another example where this coefficient converges in distribution to a random variable. We suggest the alternative degree-degree dependency measure, based on Spearman's rho, and prove that this statistical estimator converges to an appropriate limit under quite general conditions. These conditions are proved to hold in common network models, such as the configuration model and the preferential attachment model. top of page Conditional Phenomena in Time and Space Ramon van Handel (Princeton University) Random phenomena are ubiquitous throughout science and engineering. Beyond the study of stochastic models in their own right, it is of importance in many applications to understand what information can be extracted on the basis of observed data. Mathematically, such data assimilation'' problems motivate the investigation of probabilistic phenomena that arise from conditioning. This topic has connections to several areas of probability, ergodic theory, measure theory, and statistical mechanics, as well as practical implications for the design and analysis of algorithms for data assimilation in complex systems. In this talk I will give an overview of some old and new problems in this setting, with emphasis on the behavior of conditional distributions and their approximations on long time intervals and in high dimension. top of page Multivariate Regression with Calibration Lie Wang (MIT) We propose a new method named calibrated multivariate regression (CMR) for fitting high dimensional multivariate regression models. Compared to existing methods, CMR calibrates the regularization for each regression task with respect to its noise level so that it is simultaneously tuning insensitive and achieves an improved finite sample performance. We also develop an efficient smoothed proximal gradient algorithm to implement it. Theoretically, it is proved that CMR achieves the optimal rate of convergence in parameter estimation. We illustrate the usefulness of CMR by thorough numerical simulations and show that CMR consistently outperforms existing multivariate regression methods. We also apply CMR on a brain activity prediction problem and find that CMR even outperforms the handcrafted models created by human experts. top of page Consistency of Co-clustering exchangeable graph data David Choi (Heinz College, Carnegie Mellon University) We analyze the problem of partitioning a 0-1 array or bipartite graph into subgroups (also known as co-clustering), under a relatively mild assumption that the data is generated by a general nonparametric process. This problem can be thought of as co-clustering under model misspecification; we show that the additional error due to misspecification can be bounded by O(n^(-1/4)). Our result suggests that under certain sparsity regimes, community detection algorithms may be robust to modeling assumptions, and that their usage is analogous to the usage of histograms in exploratory data analysis. The result also has connections to the recent literature on exchangeable graph models, graph limits, and graphons. top of page Semimartingale reflecting Brownian motions: tail asymptotics for stationary distributions Jim Dai (Cornell University) Multidimensional semimartingale reflecting Brownian motions (SRBMs) arise as the diffusion limits for stochastic networks. I will describe a powerful methodology to obtain the tail asymptotics of the stationary distribution of an SRBM. The methodology uses a moment generating function version of the basic adjoint relationship that characterizes the stationary distribution. The tail asymptotics can be used to predict quality of service in stochastic networks. It can also be used to speed up an algorithm, devised in Dai and Harrison (1992), to compute the stationary distribution. This talk is based on joint works with Masakio Miyazawa and Jian Wu. top of page Sublinear Optimization Elad Hazan (Technion) In many modern optimization problems, specifically those arising in machine learning, the amount data is too large to apply standard convex optimization methods. We'll discuss new optimization algorithms that make use of randomization to prune the data produce a correct solution albeit running in time which is smaller than the data representation, i.e. sublinear running time. We'll present such sublinear-time algorithms for linear classification, support vector machine training, semi-definite programming and other optimization problems. These new algorithms are based on a primal-dual approach, and use a combination of novel sampling techniques and the randomized implementation of online learning algorithms. We'll describe information-theoretic lower bounds that show our running times to be nearly best possible in the unit-cost RAM model. top of page Transitory Queueing Systems Rahul Jain (University of Southern California) "Transitory" queueing systems, namely those that exist only for finite time are all around us. And yet, current queueing theory provides with very few models and even fewer tools to understand them. In this talk, I'll introduce a model that may be more appropriate in "transitory" queueing situations. I'll start by first talking about the concert arrival game" that we introduced. This model captures the tradeoff in decision-making when users must decide whether to arrive early and wait in a queue, or arrive late but miss out on opportunities (e.g., best seats at a concert). With such decision-making by the users, it is a folly to assume that the arrival process is exogenous, and a well-behaved renewal process. Thus, under the fluid model, we characterize the equilibrium arrival profile and characterize the price of anarchy" (PoA). We also give some simple mechanisms to reduce it. We then present various extensions including to general feedforward networks when routing itself is strategic. The concert queueing game led us to the Delta(i)/GI/1 queue - a new model for transitory queueing systems, in which a each user picks their time of arrival from a common distribution. We derive the fluid and diffusion process limits for such a model. We find that the limiting queue length process is a reflection of a combination of a Brownian motion and a Brownian bridge. The convergence is established in the M1 topology, and we also show that it is not possible to obtain it in the stronger J1 topology. Such models allow us to study transitory queueing systems which are not very amenable to extant techniques in queueing theory. top of page Which side chooses in large random matching markets? Yashodhan Kanoria (MSR New England and Columbia University) We analyze large random two-sided matching markets with different numbers of men and women. We show that being on the short side of the market confers a large advantage, in a sense that the short side roughly chooses the matching while the long side roughly gets chosen. In addition, the matching is roughly the same under all stable matchings. Consider a market with n men and n+k women, for arbitrary 1 <= k<= n/2. We show that, with high probability, the men's average rank of wives is less than 5log(n/k) in any stable match. Further, men's average rank of wives in the women-optimal stable match (WOSM) is within a factor 1+\epsilon of that in the men-optimal stable match (MOSM). On the other hand, the women's average rank of husbands is at least n/(5 log(n/k)) in all stable matches, and is again very nearly the same in the WOSM and MOSM. Thus, our results imply that the core is likely to be "small". Simulations show that our the same features arise even in small markets. Joint work with Itai Ashlagi and Jacob D. Leshno top of page The Art of Gambling in a Team: Multi-Player Multi-Armed Bandits Rahul Jain (University of Southern California) Multi-Armed bandits are an elegant model of learning in an unknown and uncertain environment. Such models are relevant in many scenarios, and of late have received increased attention recently due to various problems of distributed control that have arisen in wireless networks, pricing models on the internet, etc. We consider a non-Bayesian multi-armed bandit setting proposed by Lai and Robbins in mid-80s. There are multiple arms each of which generates an i.i.d. reward from an unknown distribution. There are multiple players, each of whom is choosing which arm to play. If two or more players choose the same arm, they all get zero rewards. The problem is to design a learning algorithm to be used by the players that results in an orthogonal matching of players to arms (e.g., users to channels in wireless networks), and moreover minimizes the expected regret. We first consider this as an online bipartite matching problem. We model this combinatorial problem as a classical multi-armed bandit problem but with dependent arms, and propose an index-based learning algorithm that achieves logarithmic regret. From prior results, it is known that this is order-optimal. We then consider the distributed problem where players do not communicate or coordinate in any way. We propose an index-based algorithm that uses Bertsekas' auction mechanism to determine the bipartite matching. We show that the algorithm has expected regret at most near-log-squared. This is the first distributed multi-armed bandit learning algorithm in a general setting. top of page TBD Rahul Mazumder (MIT) Low-rank matrix regularization is an important area of research in statistics and machine learning with a wide range of applications. For example, in the context of Missing data imputation arising in recommender systems (e.g. the Netflix Prize), DNA microarray and audio signal processing, among others, the object of interest is a matrix (X) for which the training data comprises of a few noisy linear measurements. The task is to estimate X, under a low rank constraint and possibly additional affine constraints on X. In practice, the matrix dimensions frequently range from hundreds of thousands to even a million --- leading to severe computational challenges. In this talk, I will describe computationally tractable models and scalable (convex) optimization based algorithms for a class of low-rank regularized problems. Exploiting problem-specific statistical insights, problem structure and using novel tools for large scale SVD computations play important roles in this task. This is joint work with Trevor Hastie, Rob Tibshirani and Dennis Sun (Stanford Statistics). top of page Optimal detection of a sparse principal component Philippe Rigollet (Princeton University) Sparse Principal Component Analysis (SPCA) is a remarkably useful tool for practitioners who had been relying on ad-hoc thresholding methods. Our analysis aims at providing a a framework to test whether the data at hand indeed contains a sparse principal component. More precisely we propose an optimal test procedure to detect the presence of a sparse principal component in a high-dimensional covariance matrix. Our minimax optimal test is based on a sparse eigenvalue statistic. Alas, computing this test is known to be NP-complete in general and we describe a computationally efficient alternative test using convex relaxations. Our relaxation is also proved to detect sparse principal components at near optimal detection levels and performs very well on simulated datasets. Moreover, we exhibit some evidence from average case complexity that this slight deterioration may be unavoidable. top of page Growth of random surfaces Alexei Borodin (Massachusetts Institute of Technology) The goal of the talk is to describe a class of solvable random growth models of one and two-dimensional interfaces. The growth is local (distant parts of the interface grow independently), it has a smoothing mechanism (fractal boundaries do not appear), and the speed of growth depends on the local slope of the interface. The models enjoy a rich algebraic structure that is reflected through closed determinantal formulas for the correlation functions. Large time asymptotic analysis of such formulas reveals asymptotic features of the emerging interface in different scales. Macroscopically, a deterministic limit shape phenomenon can be observed. Fluctuations around the limit shape range from universal laws of Random Matrix Theory to conformally invariant Gaussian processes in the plane. On the microscopic (lattice) scale, certain universal determinantal random point processes arise. No preliminary knowledge of the material will be assumed. top of page The Coherence Phase Transition Semen Shlosman (CNRS, France and Institute for Information Transmission Problems, Russia) We study particle systems corresponding to highly connected queueing networks. We examine the validity of the so-called Poisson Hypothesis (PH), which predicts that the Markov process, describing the evolution of such particle system, started from a reasonable initial state, approaches the equilibrium in time independent of the size of the network. This is indeed the case in many situations. However, there are networks for which the relaxation process slows down. This behavior reflects the fact that the corresponding infinite system undergoes a phase transition. It is characterized by the property that different nodes of the network start to evolve in a synchronous way. Such transition can happen only when the load per node exceeds some critical value, while in the low load situation the PH behavior holds. The load thus plays here the same role as the inverse temperature in statistical mechanics. We will mention a related open problem of ergodicity of interacting particle systems with unique stationary state. The talk is based on joint works with A. Rybko and A. Vladimirov. top of page Mean-field Limit for General Queueing Networks on Infinite Graphs Alexander Rybko (Institute for Information Transmission Problems, Russia) We study the sequences of Markov processes defining the evolution of a general symmetric queueng network with the number of nodes tending to infinity. These sequences have limits which are nonlinear Markov processes. Time evolution of these nonlinear Markov processes are sometimes simpler and easier to analyze than the evolution of corresponding prelimit Markov processes on finite graphs. This fact give the possibility to find good approximation for the behavior of symmetric queueing networks on large graphs. Examples will be discussed. The talk is based on a joint work with S. Shlosman. top of page Information theory of DNA sequencing Guy Bresler (University of California, Berkeley) DNA sequencing is the basic workhorse of modern biology and medicine. Shotgun sequencing is the dominant technique used: many randomly located short fragments called reads are extracted from the DNA sequence, and these reads are assembled to reconstruct the original DNA sequence. Despite major effort by the research community, the DNA sequencing problem remains unresolved from a practical perspective: it is currently not known how to produce a good assembly of most read data-sets. By drawing an analogy between the DNA sequencing problem and the classic communication problem, we define an information theoretic notion of sequencing capacity. This is the maximum number of DNA base pairs that can be resolved reliably per read, and provides a fundamental limit to the performance that can be achieved by any assembly algorithm. We compute the sequencing capacity explicitly assuming an IID model for the DNA sequence and a noiseless read process. These basic results are then partially extended to arbitrary DNA sequences: the capacity is determined by a simple sufficient statistic of the sequence which can be computed for actual genomes. The talk is based on joint works with Ma'ayan Bresler, Abolfazl Motahari, and David Tse. Speaker Bio: Guy Bresler is currently a PhD candidate in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. Prior to that, he received the B.S. degree in electrical and computer engineering and the M.S. degree in mathematics from the University of Illinois at Urbana-Champaign, both in 2006. Guy is the recipient of an NSF Graduate Research Fellowship, a Vodafone Graduate Fellowship, the Barry M. Goldwater Scholarship, a Vodafone Undergraduate Scholarship, the E.C. Jordan Award from the ECE department at UIUC, and the Roberto Padovani Scholarship from Qualcomm. His research interests include information theory, applied probability, and computational biology. top of page Research Groups at Yandex and Moscow Institute of Physics and Technology Andrei Raigorodksii (Yandex and Moscow Institute of Physics and Technology) In my talk I will present our research groups in Yandex which is the main search engine in Russia, and Moscow Institute of Physics and Technology which is a kind of Russian MIT. top of page Web Graph Models and Their Applications Andrei Raigorodksii (Yandex and Moscow Institute of Physics and Technology) In the past 20 years, there has been a lot of work concerning modeling real networks such as Internet, WWW, social networks, biological networks, etc. In our talk, we will mainly concentrate on models devoted to incorporating properties of the web graph. We will give a survey of different such models including those of Bollob\'as--Riordan, Buckley--Osthus, Cooper--Frieze, etc. We will also present some new related mathematical results obtained by our group in Yandex, Moscow Institute of Physics and Technology, Moscow State University. Finally, we will discuss some applications of these results to ranking as well as some future research directions. top of page Conditional Coding with Limiting Computational Resources Daniil Musatov (Yandex and Moscow Institute of Physics and Technology) Consider the following situation. Alice knows string x and Bob knows string y. There is a limited capacity information channel from Alice to Bob. Alice wants to transmit a message that is sufficient for Bob to recover x. What is a minimum capacity that allows her to do it? Obviously, there is a theoretical minimum: the amount of information in x that is not contained in y. Astonishing enough, this is a precise estimate: Alice can produce an optimal code even though she does not know what information Bob already possesses. In Shannon entropy framework this fact is stated as Slepian-Wolf theorem. In Kolmogorov complexity framework (where both Alice and Bob use computable functions and logarithmic advice) it was proven by Muchnik. In the talk we will give another proof of Muchnik's result and extend it to space-bounded case, where Alice and Bob use space-bounded computations and advice is still logarithmic. top of page An Application of Talagrand's Inequality to Prove a Concentration of Second Degrees in Buckley-Osthus Random Graph Model Liudmila Ostroumova (Yandex and Moscow Institute of Physics and Technology) We consider the random graph model $H_{a,m}^n$. An explicit construction of this model was suggested by Buckley and Osthus. For any fixed positive integer $m$ and positive real $a$ we define a random graph $H_{a,m}^n$ in the following way. The graph $H_{a,1}^1$ consists of one vertex and one loop. Given a graph $H_{a,1}^{t-1}$ we transform it into $H_{a,1}^{t}$ by adding one new vertex and one new random edge $(t,i)$, where $i \in \{1, \dots, t \}$ and $\Prob(i=t) = \frac{a}{(a+1)t-1}$, $\Prob(i=s) = \frac{d_{H_{a,1}^{t-1}}(s) - 1 + a}{(a+1)t-1}$ if $1 \le s \le t-1$. To construct $H_{a,m}^n$ with $m>1$ we start from $H_{a,1}^{mn}$. Then we identify the vertices $1, \dots, m$ to form the first vertex; we identify the vertices $m+1, \dots, 2m$ to form the second vertex, etc. The Buckley and Osthus random graph is a model of real-world networks. The parameter $a$ is called an initial attractiveness'' of a vertex. It was shown (by Buckley and Osthus for integer $a$ and by Grechnikov for real $a$) that the degree sequence in this model follows a power law distribution with exponent $-2-a$. We study second degrees of vertices in $H_{a,m}^n$. By the second degree of $t$ we mean the number of edges adjacent to the neighbors of $t$ except for the edges adjacent to the vertex $t$. Second degree of a vertex is an approximation of the PageRank. We show that the distribution of second degrees in $H_{a,1}^n$ follows a power law with exponent $-1-a$. Proof of this result consists of two parts. The first step is to calculate the expectation of the number of vertices with second degree $k$ and the second step is to obtain a concentration. I will talk about the second step. It is a non-trivial application of Talagrand's inequality. Instead of Talagrand's inequality it is possible to apply Azuma's inequality, but the result will be weaker. top of page Van der Warden Number and Coloring of Hypergraphs with Large Girth Dmitry Shabanov (Yandex and Moscow Institute of Physics and Technology) WThe talk is devoted to the classical problem of estimating the Van der Waerden number $W(n,r)$. The famous Van der Waerden theorem states that, for any integers $n\ge 3$, $r\ge 2$, there exists the smallest integer $W(n,r)$ such that, for every $N\ge W(n,r)$, in any $r$-coloring of the set $\{1,2,\ldots,N\}$ there exists a monochromatic arithmetic progression of length $n$. The best upper bound for $W(n,r)$ in the general case was obtained by W.T. Gowers in 2001 who proved that $$W(n,r)\le 2^{2^{r^{2^{2^{n+9}}}}}.$$ Our talk is concentrated on the lower bounds for the Van der Waerden number. We shall show that lower estimates for $W(n,r)$ are closely connected with extremal problems concerning colorings of uniform hypergraphs with large girth. Our main result is a new lower bound for $W(n,r)$: $$W(n,r)\ge c\,\frac {r^{n-1}}{\ln n},$$ where $c>0$ is an absolute constant. The proof is based on a continuous-time random recoloring process. top of page Delay optimality in switched networks (paper) Yuan Zhong (MIT ORC) We consider a switched (queueing) network in which there are constraints on which queues may be served simultaneously; such networks have been used to effectively model input-queued switches and wireless networks. The scheduling policy for such a network specifies which queues to serve at any point in time, based on the current state or past history of the system. In the main result, we provide a new class of online scheduling policies that achieve optimal average queue-size scaling for a class of switched networks including input-queued switches. In particular, it establishes the validity of a conjecture about optimal queue-size scaling for input-queued switches. Speaker Bio: Yuan Zhong is a doctoral candidate in the Operations Research Center at MIT, under the supervision of Devavrat Shah and John Tsitsiklis. He has a BA degree in Mathematics from University of Cambridge, and a MA degree in Mathematics from California Institute of Technology. He is broadly interested in stochastic modeling and analysis, as well as their applications in various business and engineering domains. top of page Does god play dice? An I/O scheduling and airplane boarding perspective. Eitan Bachmat (Ben-Gurion University) Given N I/O requests to a disk drive, how long will it take the disk drive to service them? How long does it take to board an airplane? We will show that such questions are modeled by the same math that models the universe in relativity theory, namely, space-time (Lorentzian) geometry.As a result of this connection we will try to understand what is the best way to service I/O, what is the best airplane boarding policy and what makes free falling bodies take the trajectories that they do. On the way we may also encounter a silly card game and random matrices. The talk is self contained, experience with boarding an airplane is helpful. Speaker Bio: Eitan Bachmat is a senior lecturer at the Computer Science department of Ben-Gurion U. He received his PhD in mathematics from M.I.T. He works mostly on the design of storage systems, performance analysis and applied probability. He also consults to the storage industry and participated in the development of several commercially available systems. Consequently, he has more patents than academic papers. top of page Asymptotics for preferential attachment random graphs via Stein's method (paper) Erol Peköz (Boston University) Preferential attachment random graphs evolve in time by sequentially adding vertices and edges randomly so that connections to vertices having high degree are favored. There has been an explosion of research on these models since Barabasi and Albert popularized them in 1999 to explain the so-called power law behavior observed in real networks such as the graph of the Internet where Web pages correspond to vertices and hyperlinks between them correspond to edges. In this talk we will show how to derive rates of convergence for the distribution of the degree of a fixed vertex to its distributional limit using a new variation of Stein's method that relies on finding couplings to motivate distributional transformations for which limit distributions are fixed points. Surprisingly little is known about these limit distributions despite the large literature on such graphs. top of page On Resolution, Sparse Signal Recovery, and Random Access Communication (abstract) Vivek Goyal (MIT EECS) Resolution of a data acquisition system is usually thought to be determined completely by sensing properties, such as density and accuracy of measurements. This talk advocates the view that resolution is, in addition, dependent on signal modeling and the complexity of computations allowed. This view is developed concretely for the acquisition of sparse signals, where the asymptotic relationship between resolution and the number of measurements is studied for algorithms with various complexities. The sparse signal recovery problem corresponds perfectly to a model of random multiple access communication in which the task of the receiver is to determine only the subset of the users that transmitted. This connection gives insight on the performance of single- and multi-user detection algorithms. Specifically, all known practical multiuser detection techniques become interference limited at high SNR. However, power control is natural in the multiple access setting, and we are able to prove that a simple detection algorithm based on orthogonal matching pursuit is not interference limited under optimal power control. The talk is based on joint work with Alyson Fletcher and Sundeep Rangan. top of page The Smoothed Linear Program for Approximate Dynamic Programming (abstract) Vivek F. Farias (MIT Sloan) We present a novel linear program for the approximation of the dynamic programming value function in high-dimensional stochastic control problems. LP approaches to approximate DP naturally restrict attention to approximations that, depending on the context, are upper or lower bounds to the optimal value function. Our program -- the `smoothed LP' -- relaxes this restriction in an appropriate fashion while remaining computationally tractable. Doing so appears to have several advantages: o We demonstrate superior bounds on the quality of approximation to the optimal value function afforded by our approach. o Experiments with our approach on a challenging problem (the game of Tetris) show that the approach outperforms a variety of approximate DP algorithms (including the LP approach, TD-learning and policy gradient methods) by a significant margin. Joint work with Vijay Desai and Ciamac Moallemi (Columbia). top of page Fractional simple random walk (abstract) Scott Sheffield (MIT Math) Fractional Brownian motions are the most natural generalizations of ordinary (one-dimensional) Brownian motions that allow for some amount of long range dependence (a.k.a. "momentum effects"). They are sometimes used in mathematical finance as models for logarithmic asset prices. We describe some natural random simple walks on the integers that have fractional Brownian motion as a scaling limit. In a sense, these walks are the most natural discrete analogs of fractional Brownian motion. top of page A New Look at the Compound Poisson Distribution and Compound Poisson Approximation using Entropy (abstract) Mokshay Madiman (Yale University) We develop an information-theoretic foundation for compound Poisson approximation and limit theorems (analogous to the corresponding developments for the central limit theorem and for simple Poisson approximation). First, su?cient conditions are given under which the compound Poisson distribution has maximal entropy within a natural class of probability measures on the nonnegative integers. In particular, it is shown that a maximum entropy property is valid if the measures under consideration are log-concave, but that it fails in general. Second, approximation bounds in the (strong) relative entropy sense are given for distributional approximation of sums of independent nonnegative integer valued random variables by compound Poisson distributions. The proof techniques involve the use of a notion of local information quantities that generalize the classical Fisher information used for normal approximation, as well as the use of ingredients from Stein's method for compound Poisson approximation. This work is joint with Andrew Barbour (Zurich), Oliver Johnson (Bristol) and Ioannis Kontoyiannis (AUEB). top of page Sequential algorithms for generating random graphs (abstract) Mohsen Bayati (Microsoft Research New England) Large-scale complex networks have been the objects of study for the past two decades, and one central problem have been constructing or designing realistic models for such networks. This problem appears in a variety of applications including coding theory and systems biology. Unfortunately, the existing algorithms for this problem have large running times, making them impractical to use for networks with millions of links. We will present a technique to design an almost linear time algorithm for generating random graphs with given degree sequences for a range of degrees. Next, we extend the analysis to design an algorithm for efficiently generating random graphs without short cycles. These algorithms can be used for constructing high performance low-density parity-check codes (LDPC codes). Based on joint works with Jeong Han Kim, Andrea Montanari, and Amin Saberi. top of page Convergence of unitary matrix integrals (abstract) Benoît Collins (University of Ottawa) We introduce the unitary Schwinger-Dyson equation associated to a selfadjoint polynomial potential V. The V=0 case corresponds to the free product state, so the Schwinger-Dyson equation can be considered as a deformation of free probability. We show that the solutions of this equation are unique for small V's and correspond to a large N limit of a multi-matrix model. This technique allows to show that a wide class of unitary and orthogonal multi-matrix models converge asymptotically. We also give a combinatorial and analytic description of the limit, solving a series of open questions raised in theoretical physics in the late 70's. This is joint work with A. Guionnet and E. Maurel-Segala. top of page The Gaussian random walk, sampling Brownian motion, and the Riemann zeta function (abstract) Johan van Leeuwaarden (Eindhoven University of Technology, EURANDOM, NYU Stern) We consider the Gaussian random walk (one-dimensional random walk with normally distributed increments), and in particular the moments of its maximum M. Explicit expressions for all moments of M are derived in terms of Taylor series with coefficients that involve the Riemann zeta function. We build upon the work of Chang and Peres (1997) on P(M=0) and Bateman's formulas on Lerch's transcendent. Our result for E(M) completes earlier work of Kingman (1965), Siegmund (1985), and Chang and Peres (1997). The maximum M shows up in a range of applications, such as sequentially testing for the drift of a Brownian motion, corrected diffusion approximations, simulation of Brownian motion, option pricing, thermodynamics of a polymer chain, and queueing systems in heavy traffic. Some of these applications are discussed, as well as several issues open for further research. This talk is based on joint work with A.J.E.M. Janssen. top of page Quantum Computation by Adiabatic Evolution (abstract) Ed Farhi (MIT Physics) The quantum adiabatic algorithm is a general approach to solving combinatorial search problems using a quantum computer. It will succeed if the run time is long enough. I will discuss how this algorithm works and our current understanding of the required run time. top of page Large deviations for random walks under subexponentiality: the big-jump domain (abstract)(slides) Ton Dieker (Georgia Tech I&SE) Stimulated by applications to internet traffic modeling and insurance mathematics, distributions with heavy tails have been widely studied over the past decades. This talk addresses a fundamental large-deviation problem for random walks with heavy-tailed step-size distributions. We consider so-called subexponential step-size distributions, which constitute the most widely-used class of heavy-tailed distributions. I will present a theory to study sequences {x_n} for which P{S_n>x_n} behaves asymptotically like n P {S_1>x_n} for large n. (joint work with D. Denisov and V. Shneer) top of page Bounds on Stationary Expectations for Markov Processes (abstract) Peter Glynn (Stanford MS&E) Many performance engineering and operations research modeling formulations lead to Markov models in which the key performance measure is an expectation defined in terms of the stationary distribution of the process. In models of realistic complexity, it is often difficult to compute such expectations in closed form. In this talk, we will discuss a simple class of bounds for such stationary expectations, and describe some of the mathematical subtleties that arise in making rigorous such bounds. We will also discuss how such bounds can be used algorithmically. This work is joint with Denis Saure and Assaf Zeevi. top of page Fragility of Asymptotic Agreement under Bayesian Learning (abstract)(slides) Daron Acemoglu (MIT Econ) Under the assumption that individuals know the conditional distributions of signals given the payoff-relevant parameters, existing results conclude that, as individuals observe infinitely many signals, their beliefs about the parameters will eventually merge. We first show that these results are fragile when individuals are uncertain about the signal distributions: given any such model, a vanishingly small individual uncertainty about the signal distributions can lead to a substantial (non-vanishing) amount of differences between the asymptotic beliefs. We then characterize the conditions under which a small amount of uncertainty leads only to a small amount of asymptotic disagreement. According to our characterization, this is the case if the uncertainty about the signal distributions is generated by a family with "rapidly-varying tails" (such as the normal or the exponential distributions). However, when this family has "regularly-varying tails" (such as the Pareto, the log-normal, and the t-distributions), a small amount of uncertainty leads to a substantial amount of asymptotic disagreement. top of page Quantile and Probability Curves without Crossing (abstract)(slides) Victor Chernozhukov (MIT Econ) The most common approach to estimating conditional quantile curves is to fit a curve, typically linear, pointwise for each quantile. Linear functional forms, coupled with pointwise fitting, are used for a number of reasons including parsimony of the resulting approximations and good computational properties. The resulting fits, however, may not respect a logical monotonicity requirement -- that the quantile curve be increasing as a function of probability. This paper studies the natural monotonization of these empirical curves induced by sampling from the estimated non-monotone model, and then taking the resulting conditional quantile curves that by construction are monotone in the probability. This construction of monotone quantile curves may be seen as a bootstrap and also as a monotonic rearrangement of the original non-monotone function. It is shown that the monotonized curves are closer to the true curves in finite samples, for any sample size. Under correct specification, the rearranged conditional quantile curves have the same asymptotic distribution as the original non-monotone curves. Under misspecification, however, the asymptotics of the rearranged curves may partially differ from the asymptotics of the original non-monotone curves. An analogous procedure is developed to monotonize the estimates of conditional distribution functions. The results are derived by establishing the compact (Hadamard) differentiability of the monotonized quantile and probability curves with respect to the original curves in discontinuous directions, tangentially to a set of continuous functions. In doing so, the compact differentiability of the rearrangement-related operators is established. top of page Exponential Error Bounds for Random Codes on the BSC (abstract) David Forney (MIT LIDS) One of Shannon's earliest results was his determination of the capacity of the binary symmetric channel (BSC). Shannon went on to show that, with randomly chosen codes and optimal decoding, the probability of decoding error decreases exponentially for any transmission rate less than capacity. Much of the important early work of Shannon, Elias, Fano and Gallager was devoted to determining bounds on the corresponding "error exponent." A later approach to this problem, pioneered by Csiszar and Korner, and now adopted in many modern information theory courses such as 6.441, determines error exponents using large-deviation theory. This approach has several advantages: (a) It can be remarkably simple and intuitive; (b) It gives correct error exponents, not just bounds; (c) It gives insight into how decoding errors typically occur. We illustrate this approach by developing the correct error exponents at all rates for both random codes and random linear codes on the BSC, assuming optimal decoding. We discuss how decoding errors typically occur. In particular, we show why the classical algebraic coding approach never had any hope of approaching channel capacity, even on the BSC. This talk is intended to be pitched at an elementary and tutorial level. top of page
 ORC LIDS Statistics@MIT Maps & Directions