Learning Theory/State of the Art/ May 9-11th 2011

Courses
Shahar Mendelson

Professor at the Centre for Mathematics and Its Applications, The Australian National University, Canberra, Australia and at the Department of Mathematics, Technion, I.I.T, Haifa, Israel.

Home page

Objectives : Geometric methods in learning theory

Synopsis : Learning theory has been a playground for  empirical processes  techniques (i.e. understanding the learning rate of Empirical Risk Minimization (ERM) amounts to understand the oscillations of  the empirical process indexed by the right subclass). The course will start by surveying this connection. We will then consider classical parameters in learning (like the Rademacher/gaussian averages and combinatorial dimension), their properties and their role in asymptotic geometry. We will focus about the places where the boundedness assumption makes life easier, because the goal will be to remove the assumption. Then, I will pay attention to the Bernstein condition and the way it is connected to the convexity of the loss and the uniqueness of best approximation in the class, both for bounded as well as for unbounded classes of functions.
Finally and that will be the main topic, I will talk about learning unbounded function classes - in which I will focus on my more recent results. I will also present a couple of problems in geometry that need the same type of analysis - of empirical processes indexed by powers of unbounded function classes.
Old Videos : http://videolectures.net/mlss02_mendelson_sltep/

 

Ph.D. in applied mathematics from the Technion in 1998.

Current research interest is high dimensional phenomena. In particular, interested in the connections between Asymptotic Geometric Analysis, Empirical Processes Theory and Statistical Learning Theory.

CV, list of publications and some links.

 

 
Sanjoy DasGupta

Professor Computer Science UC San Diego

Home page

Objectives : Unsupervised and minimally-supervised learning.
What can be gleaned from data that has not been labeled by humans, or which has been labeled only very partially? I will give an overview of the current theoretical foundations of three major research areas of machine learning:  Clustering; Learning from data of low intrinsic dimension; Active learning

 

Synopsis
Clustering
A common way to arrive at an understanding of a novel data set is to cluster it. However, the practice of clustering is notoriously ill-founded and is dominated by heuristics with few guarantees. In what sense is the output of typical clustering algorithms at all meaningful?
We'll consider a precisely defined clustering problem, in which the data has been generated by an unknown mixture of k Gaussian sources. Over the past decade, there has been significant progress in developing algorithms that will work well in this scenario, and in understanding the complexity (in time and number of samples) of these schemes.
Slides of Lesson 1, Monday, the 9th of May
Learning from data of low intrinsic dimension
Nonparametric statistics (histograms, kernel density estimation, etc) is plagued by a curse of dimensionality: if data lie in d-dimensional space, then the number of samples needed to fit a good model generically scales exponentially with d. This is particularly unfortunate given that modern data sets are increasingly high dimensional.
One mitigating factor is that data that is apparently high-dimensional often has low "intrinsic" dimension. We'll discuss various notions of intrinsic dimension, as well as statistical and algorithmic techniques that require resources that scale only with this smaller dimension.
Active learning
A central problem of machine learning and statistics is the task of learning a classifier or predictor: that is, a map from the space of data (each data point might, for instance, be a summary of a credit card transaction) to the space of possible labels (whether or not the transaction is fraudulent, for instance). A key requirement for being able to learn a good classifier is having enough labeled data.
Frequently, unlabeled data is easily available but labels are expensive to come by. If each label has a non-negligible cost, is it possible to start with a large pool of unlabeled data and then adaptively decide which points to label, so that a good classifier is obtained at low cost?
Videos on clustering : http://videolectures.net/mlss09us_dasgupta_acp/

 

Works on algorithmic statistics, with a focus on unsupervised and minimally supervised learning.

 

Attachments:
Download this file (IHP2011_Dasgupta1-parta.pdf)IHP2011_Dasgupta1-parta.pdf[Course 1 part a Sanjoy Dasgupta]319 Kb
Download this file (IHP2011_Dasgupta1-partb.pdf)IHP2011_Dasgupta1-partb.pdf[Course 1 part b Sanjoy Dasgupta]856 Kb
Download this file (IHP2011_Dasgupta1.pdf)IHP2011_Dasgupta1.pdf[Slides of Sanjoy Dasgupta's first course 2011/05/09 (with overlays)]1261 Kb
Download this file (paris2.pdf)IHP2011_Dasgupta2.pdf[Slides of Sanjoy Dasgupta's second course 2011/05/10]1484 Kb
Download this file (paris3.pdf)IHP2011_Dasgupta3.pdf[Slides of Sanjoy Dasgupta's third course 2011/05/11]451 Kb
 
Peter Bartlett

Professor Statistics and Computer Science
UC Berkeley

Home page

http://www.stat.berkeley.edu/~bartlett/talks/ihp-may-2011.pdf
http://www.stat.berkeley.edu/~bartlett/talks/BeijingCourse2010.html
This short course will provide an introduction to the design and theoretical analysis of prediction methods, focusing on statistical and computational aspects. It will cover probabilistic and game theoretic formulations of prediction problems, and will examine questions about the guarantees we can prove about the performance of prediction algorithms and the inherent difficulty of prediction problems.

Synopsis:
- Formulation of online prediction problems in adversarial environments. Motivations.
- Finite comparison class: Prediction with expert advice. Halving algorithm. Exponential weights.
- Extensions. Statistical prediction with a finite class.
- Converting online strategies for adversarial environments to batch strategies for probabilistic environments.
- Online convex optimization: Problem formulation. The limitations of empirical minimization. Gradient methods. Regularized minimization. Bregman divergence. Linearization. Mirror descent.
- Regret bounds. Strongly convex losses.
- Log loss: universal portfolios, universal compression, prequential analysis. Normalized maximum likelihood. Sequential investment. Constantly rebalanced portfolios.

 

Peter Bartlett is a professor in the Division of Computer Science and Department of Statistics at the University of California at Berkeley. He is the co-author, with Martin Anthony, of the book Learning in Neural Networks: Theoretical Foundations, has edited three other books, and has co-authored many papers in the areas of machine learning and statistical learning theory. He has served as an associate editor of the journals Machine Learning, Mathematics of Control Signals and Systems, the Journal of Machine Learning Research, the Journal of Artificial Intelligence Research, and the IEEE Transactions on Information Theory, as a member of the editorial boards of Machine Learning, the Journal of Artificial Intelligence Research, and Foundations and Trends in Machine Learning, and as a member of the steering committees of the Conference on Computational Learning Theory and the Algorithmic Learning Theory Workshop. He has consulted to a number of organizations, including General Electric, Telstra, and SAC Capital Advisors. In 2001, he was awarded the Malcolm McIntosh Prize for Physical Scientist of the Year in Australia, for his work in statistical learning theory. He was a Miller Institute Visiting Research Professor in Statistics and Computer Science at U.C. Berkeley in Fall 2001, and a fellow, senior fellow and professor in the Research School of Information Sciences and Engineering at the Australian National University's Institute for Advanced Studies (1993-2003), and an honorary professor in the School of Information Technology and Electrical Engineering at the University of Queensland. His research interests include machine learning, statistical learning theory, and adaptive control.

 

 


Sponsors