What You Can Learn When You Know Almost Nothing
Intro to Learning Theory
Ali Rahimi (ali.rahimi@intel.com)
MW 2:00-3:30
Course number: cse599r Location: cse 303
Introduction
This is a computer science and math class about how and why learning
(by computers, animals, or humans) is possible. We take it as a
casual fact that things can be learned (that machines can learn to
recognize objects, robots can learn to play tennis, humans can learn
to feed themselves and not get mauled by tigers), but fundamentally,
all learning systems must make regularity assumptions: For example,
that the world is "smooth", that things that appear similar to our
senses behave similarly, that no evil genius adversarially
manipulates our senses.
Learning Theory quantifies these assumptions. It provides mathematical
bounds on what can be learned, with how much data, given the
assumptions we're willing to make. More importantly, it sheds light on
how to design good learning algorithms.
Philosophy is unfashionable. Besides, the most truthful philosophy is
mathematics, and the most honest mathematicians are engineers. So in
this class, we'll view these issues of learning under a mathematical
lens, and prove to ourselves that this math is useful by evaluating
the lessons we learn on computers.
Warning: This is my first time teaching a class on this topic.
Prerequisites
Please be comfortable with basic probability theory (distributions,
expectations, Bayes rule, conditioning, marginalizing), linear algebra
(least squares, eigenvalues), and optimization (gradient descent,
linear programs, quadratic programs, convexity, polyhedral sets). A
course in analysis will be helpful but not necessary.
Reading Material
We'll supplement the Yellow Terror with some more modern material from
survey papers.
- Yellow Terror: "A Probabilistic theory of pattern recognition", Devroye, Gyrofi, Lugosi.
- "Theory of classification: A survey of some recent advances", Boucheron, Bousquet, Lugosi.
- "Introduction to statistical learning theory", Bousquet, Boucheron, Lugosi.
Grading
Grades are based on regular homeworks (50%) and a final project (50%).
I will teach a lot of the deep ideas via homework exercises, so it's
worth spending a lot of time on them.
Outline
Number of lectures dedicated to each topic appears in parens.
- Introduction (1)
- Algorithms (1)
- k-Nearest neighbors and untrained rules
- Empirical risk minimization with linear regression,
kernel machines, and boosting
- Warmup: Asymptotic Expected Risk of Nearest Neighbors (1)
- Bad news, lower bounds, and depressing results (2)
- Computational: When Empirical Risk Minimization is inapproximable (Amaldi and Kann)
- Statistical: Learning a good classifier is impossible.
- Function approximation (1)
- The Maurey-Barron-Jones lemma
- Barron's sigmoids
- Jones' ridges
- Girosi and Anzelloti's translates of Gaussians
- Reproducing Kernel Hilbert Spaces
- Generalization properties of Empirical Risk Minimization
- Preliminary concepts (1)
- Estimating the risk of a single classifier using a Hoeffding bound
- Deviation between optima vs. uniform deviation over a class
- Estimation error vs. approximation error
- Some stronger concentration bounds we will need later:
- Hoeffding, McDiarmid, Kolmogorov, Bennet, Bernstein, and martingales
- VC generalization bounds (1)
- Vapnik and Chervonenkis' bound
- The VC dimension of some function classes
- Implication for Uniform Glivenko-Cantelliness of classes
- Covering number bounds (1)
- Pollard's bound
- Using Maurey-Barron-Jones to cover some Hilbert spaces.
- Rademacher complexity (2)
- The Rademacher complexity of some function classes
- Bounds that use algorithmic stability (probably skip this)
- Bousquet & Elisseeff's bound
- Revisiting the bad news (1)
- The perceptron rule.
- Approximating the 0-1 loss with a surrogate convex losses.
(Problem 2.12 of YT, Zhang, Bartlett-McAuliffe-Jordan)
- Important (modern, well-understood, useful, and actively
researched) concepts in learning theory that we will not cover
- Online learning
- Dependent data
- Active learning
- Transductive and semi-supervised learning
Other learning theory courses
You may find (and I do find) the lecture notes from these courses
helpful: