the MaD Seminar
The MaD seminar features leading specialists at the interface of Applied Mathematics, Statistics and Machine Learning.
Room: Auditorium Hall 150, Center for Data Science, NYU, 60 5th ave.
Time: 2:30pm-3:30pm, Reception will follow.
Subscribe to the Seminar Mailing list here
Schedule with Confirmed Speakers
|Jan 26||Dave Donoho (Stanford)||Optimal Shrinkage of Covariance Matrices in light of the spiked covariance model|
|Feb 2||Andrew Gelman (Columbia)||Taking Bayesian Inference Seriously|
|Leslie Greengard (Courant)||Inverse problems in acoustic scattering and cryo-electron microscopy|
|Feb 16||Ronald Coifman (Yale)||Organization and Analysis on data tensors|
|Feb 23||John Wright (Columbia)||Nonconvex Recovery of Low-Complexity Models|
|Mar 2||Gitta Kutyniok (TU Berlin)||Optimal Approximation with Sparse Deep Neural Networks|
|Mar 9||Philippe Rigollet (MIT)||TBA|
|Mar 16||SPRING BREAK|
|Mar 23||Amit Singer (Princeton)||TBA|
|Mar 30||Stephane Mallat (ENS Ulm)||TBA|
|Apr 6||Ben Recht (UC Berkeley)||TBA|
|Apr 13||Waheed Bajwa (Rutgers)||TBA|
|Apr 20||Andrea Montanari (Stanford)||TBA|
|Apr 27||Joel Tropp (Caltech)||TBA|
|May 4||Thomas Strohmer (UC Davis)||TBA|
(joint work with Behrooz Ghorbani, Stanford)
In recent years, there has been a great deal of excitement about ‘big data’ and about the new research problems posed by a world of vastly enlarged datasets. In response, the field of Mathematical Statistics increasingly studies problems where the number of variables measured is comparable to or even larger than the number of observations. Numerous fascinating mathematical phenomena arise in this regime; and in particular theorists discovered that the traditional approach to covariance estimation needs to be completely rethought, by appropriately shrinking the eigenvalues of the empirical covariance matrix.
This talk briefly reviews advances by researchers in random matrix theory who in recent years solved completely the properties of eigenvalues and eigenvectors under the so-called spiked covariance model. By applying these results it is now possible to obtain the exact optimal nonlinear shrinkage of eigenvalues for certain specific measures of performance, as has been shown in the case of Frobenius loss by Nobel and Shabalin, and for many other performance measures by Donoho, Gavish, and Johnstone.
In this talk, we focus on recent results of the author and Behrooz Ghorbani on optimal shrinkage for the condition number of the relative error matrix; this presents new subtleties. The exact optimal solutions will be described, and stylized applications to Muti-User Covariance estimation and Multi-Task Discriminant Analysis will be developed.
Over the years I have been moving toward the use of informative priors in more and more of my applications. I will discuss several examples from theory, application, and computing where traditional noninformative priors lead to disaster, but a little bit of prior information can make everything work out. Informative priors also can resolve some of the questions of replication and multiple comparisons that have recently shook the world of science. It’s funny for me to say this, after having practiced Bayesian statistics for nearly thirty years, but I’m only now realizing the true value of the prior distribution.
A variety of problems in image reconstruction give rise to large-scale, nonlinear and non-convex optimization problems. We will show how recursive linearization combined with suitable fast solvers are bringing such problems within practical reach, with an emphasis on acoustic scattering and protein structure determination via cryo-electron microscopy.
Our goal is to illustrate and give an overview of various emerging methodologies to geometrize tensor data and build analytics on that foundation.
Starting with conventional data bases given as matrices , where we organize simultaneously rows and columns , viewed as functions of each other . We extend the process to higher order tensors,on which we build joint geometries.
We will describe various applications to the study of questionnaires , medical and genetic data , neuronal dynamics in various regimes. In particular we will discuss a useful integration of these analytic tools with deep nets and the features they reveal.
Nonconvex optimization plays important role in wide range of areas of science and engineering — from learning feature representations for visual classification, to reconstructing images in biology, medicine and astronomy, to disentangling spikes from multiple neurons. The worst case theory for nonconvex optimization is dismal: in general, even guaranteeing a local minimum is NP hard. However, in these and other applications, very simple iterative methods such as gradient descent often perform surprisingly well.
In this talk, I will discuss examples of nonconvex optimization problems that can be solved to global optimality using simple iterative methods, which succeed independent of initialization. These include variants of the sparse dictionary learning problem, image recovery from certain types of phaseless measurements, and variants of the sparse blind deconvolution problem. These problems possess a characteristic structure, in which (i) all local minima are global, and (ii) the energy landscape does not have any “flat” saddle points. For each of the aforementioned problems, this geometric structure allows us to obtain new types of performance guarantees. I will motivate these problems from applications in imaging and computer vision, and describe how this viewpoint leads to new approaches to analyzing electron microscopy data.
Joint work with Ju Sun (Stanford), Qing Qu (Columbia), Yuqian Zhang (Columbia), Yenson Lau (Columbia) Sky Cheung, (Columbia), Abhay Pasupathy (Columbia)
Deep neural networks show impressive results in a variety of real-world applications. One central task of them is to approximate a function, which for instance encodes a classification problem. In this talk, we will be concerned with the question, how well a function can be approximated by a deep neural network with sparse connectivity, i.e., with a minimal number of edges. Using methods from approximation theory and applied harmonic analysis, we will first prove a fundamental lower bound on the sparsity of a neural network if certain approximation properties are required. By explicitly constructing neural networks based on certain representation systems, so-called $\alpha$-shearlets, we will then demonstrate that this lower bound can in fact be attained. Finally, given a fixed network topology with sparse connectivity, we present numerical experiments, which show that already the standard backpropagation algorithm generates a deep neural network obeying those optimal approximation rates. Interestingly, our experiments also show that restricting to subnetworks, the learning procedure even yields $\alpha$-shearlet-like functions. This is joint work with H. B\“olcskei (ETH Zurich), P. Grohs (Uni Vienna), and P. Petersen (TU Berlin).