seminar

- Speaker: Daniel Hsu
- Title: Predictive models from interpolation
- Abstract: Recent experiments with non-linear machine learning methods demonstrate the generalization ability of classification and regression models that interpolate noisy training data. It is difficult for existing generalization theory to explain these observations. On the other hand, there are classical examples of interpolating methods with non-trivial risk guarantees, the nearest neighbor rule being one of the most well-known and best-understood. I'll describe a few other such interpolating methods (old and new) with stronger risk guarantees compared to nearest neighbor in high dimensions. In some cases, we can demonstrate minimax optimal rates of convergence with interpolating methods. This counters the conventional wisdom that interpolation necessarily implies poor generalization.

- Speaker: Phil Long
- Title: The singular values of convolutional layers
- Abstract: We characterize the singular values of the linear transformation associated with a standard 2D multi-channel convolutional layer, enabling their efficient computation. This characterization also leads to an algorithm for projecting a convolutional layer onto an operator-norm ball. We show that this is an effective regularizer; for example, it improves the test error of a deep residual network on CIFAR-10 from 6.2% to 5.3%. This is joint work with Hanie Sedghi and Vineet Gupta.

- Speaker: Adam Klivans
- Title: Frontiers of Efficient Neural-Network Learnability
- Abstract: What is the most expressive class of neural networks that can be learned, provably, in polynomial-time in a distribution-free setting? In this talk we will describe how to combine isotonic regression with kernel methods to give efficient algorithms for learning neural networks with two nonlinear layers. We will touch upon relationships with recent work on SGD plus overparameterization.

- Speaker: Rina Panigrahy
- Title: Memory, Modularity, and the Theory of Deep Learnability
- Abstract: Why does deep learning work well for some applications and not for others? Do we need major architectural changes in deep learning to solve complex problems like natural language understanding and logic? Does memory and modular organization play an important role, and if so, how do we store complex concepts in memory? We will try to get a conceptual understanding of these questions by studying learning problems arising from synthetic mathematical function classes such as the learnability of polynomials, shallow teacher networks, and possible cryptographic hardness of learning deeper teacher networks. Finally we will present nascent ideas about how we should model memory and evolve a modular view of deep learning for higher level cognitive functions.

- Speaker: Costis Daskalakis
- Title: Reducing AI Bias using Truncated Statistics
- Abstract: An emergent threat to the practical use of machine learning is the presence of bias in the data used to train models. Biased training data can result in models which make incorrect, or disproportionately correct decisions, or that reinforce the injustices reflected in their training data. For example, recent works have shown that semantics derived automatically from text corpora contain human biases, and found that the accuracy of face and gender recognition systems are systematically lower for people of color and women.

While the root causes of AI bias are difficult to pin down, a common cause of bias is the violation of the pervasive assumption in machine learning and statistics that the training data are unbiased samples of an underlying “test distribution,” which represents the conditions that the trained model will encounter in the future. We present a practical framework, based on SGD and truncated Statistics, for regression and classification targeting such settings, which identifies both the mechanism inducing the discrepancy between the training and test distributions, and a predictor that targets performance in the test distribution. Ourframework provides computationally and statistically efficient algorithms for truncated density estimation and truncated linear, logistic and probit regression. We provide experiments to illustrate the efficacy of our framework in removing bias from gender classifiers.

(Based on joint works with Themis Gouleakis, Andrew Ilyas, Vasilis Kontonis, Sujit Rao, Christos Tzamos, Manolis Zampetakis)

- Speaker: Tengyu Ma
- Title: Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
- Abstract: Stochastic gradient descent with a large initial learning rate is a widely adopted method for training modern neural net architectures. Although a small initial learning rate allows for faster training and better test performance initially, the large learning rate achieves better generalization soon after the learning rate is annealed. Towards explaining this phenomenon, we devise a setting in which we can prove that a two-layer network trained with large initial learning rate and annealing provably generalizes better than the same network trained with a small learning rate from the start. The key insight in our analysis is that the order of learning different types of patterns is crucial: because the small learning rate model first memorizes low noise, hard-to-fit patterns, it generalizes worse on higher noise, easier-to-fit patterns than its large learning rate counterpart. This concept translates to a larger-scale setting: we demonstrate that one can add a small patch to CIFAR-10 images that is immediately memorizable by a model with small initial learning rate, but ignored by the model with large learning rate until after annealing. Our experiments show that this causes the small learning rate model’s accuracy on unmodified images to suffer, as it relies too much on the patch early on.

- Speaker (30 minutes): Shafi Goldwasser, Manfred Warmuth

**Speaker (11:00-11:30):**Ali Rahimi.**Title (11:00-11:30):**How To Beat Empirical Risk Minimization.

**Speaker (11:30-12:00):**Katherine Heller.**Title (11:30-12:00):**Improving medical predictions by caring about uncertainty: Combining deep neural networks with Bayesian methodology.

seminar.txt · Last modified: 2019/07/29 13:50 by matus