Introduction to Computational Learning Theory (COMP SCI 639)

Spring 2020

This course will focus on developing the core concepts and techniques of computational learning theory. We will examine the inherent abilities and limitations of learning algorithms in well-defined learning models. Specifically, the course will focus on algorithmic problems in supervised learning. The goal of supervised learning is to infer a function from a set of labeled observations. We will study algorithms for learning Boolean functions from labeled examples in a number of models (online learning, PAC learning, SQ learning, learning with noise, etc.).

Course Information

Instructor: Ilias Diakonikolas

Teaching Assistant: Nikos Zarifis (zarifis@wisc.edu)

Lectures: Tuesday, Thursday 1:00-2:15, COMP SCI 1325.

Prerequisites

Mathematical maturity. Background in undergraduate algorithms.

Course Outline

Here is an outline of the course material:

Online Learning: Winnow, Best Experts, Weighted Majority

PAC Learning, Relation to Online Learning, Occam's Razor

VC Dimension and Sample Complexity

Learning Decision Trees and DNFs

Boosting

Learning with Noise: Classification Noise, Malicious Noise

Statistical Query Learning

Distribution Dependent Learning, Fourier Transform

Computational Hardness of Learning

Learning with Membership and Equivalence Queries

Other Models of Learning: Semi-supervised Learning, Active Learning

Lectures

Lecture 1 (January 21) Introduction to computational learning theory. Supervised Learning.

Lecture 2 (January 23) Introduction to Online Learning.

Lecture 3 (January 28) Online Learning of Disjunctions and Decision Lists.

Lecture 4 (January 30) Winnow and Perceptron Algorithms.

Lecture 5 (February 4) More on Winnow and Perceptron. Introduction to VC dimension.

Lecture 6 (February 6) VC dimension and lower bound on online learning. Weighted Majority Algorithm.

Lecture 7 (February 11) Analysis of Weighted Majority, Randomized Weighted Majority and Analysis

Lecture 8 (February 13) Introduction to PAC Learning.

Lecture 9 (February 18) PAC Learning continued. Learning Intervals and Rectangles. Reduction from Online Learning to PAC Learning.

Lecture 10 (February 20) Finding a Consistent Hypothesis and Occam’s Razor. Cardinality version of Occam’s Razor.

Lecture 11 (February 25) Greedy Set Cover Heuristic. Using cardinality version of Occam’s razor to PAC learn sparse disjunctions with near-optimal sample complexity.

Lecture 12 (February 27) Hypothesis Testing. Basic Concentration Inequalities. Proper vs Non-proper PAC Learning.

Lecture 13 (March 3) Proper vs Non-proper Learning Continued. NP-hardness of properly learning 3-term DNFs. Efficient Algorithm for Non-proper Learning of 3-term DNFs.

Lecture 14 (March 5) VC Dimension Characterizes Sample Complexity of PAC Learning. Proof of Sample Complexity Lower Bound.

Lecture 15 (March 10) VC Dimension Characterizes Sample Complexity of PAC Learning. Sauer’s Lemma and Start of Sample Complexity Upper Bound Proof.

Lecture 16 (March 12) VC Dimension Characterizes Sample Complexity of PAC Learning. Proof of Sauer’s Lemma and Upper Bound Proof Continued.

Lecture 17 (March 24) Efficient Learning of Linear Threshold Functions. Introduction to Boosting. Schapire’s Three-State Booster.

Lecture 18 (March 26) Schapire’s Three-State Booster Continued.

Lecture 19 (March 31) Introduction to Boosting via Sampling Approach. Adaboost.

Lecture 20 (April 2) Adaboost Algorithm and Analysis Continued.

Lecture 21 (April 7) Introduction to Learning with Noise. Random Classification Noise, Malicious Noise.

Lecture 22 (April 9) Information-Theoretic Lower Bound on Learning with Malicious Noise. General Approach for Learning with Malicious Noise.

Lecture 23 (April 14) Random Classification Noise (RCN). Learning Monotone Disjunctions with RCN. Introduction to Statistical Query Model.

Lecture 24 (April 16) Statistical Query (SQ) Learning Model. SQ Learning Implies Learning with Random Classification Noise.

Lecture 25 (April 21) Hardness of Learning: Representation dependent vs Independent Hardness. Worst-case vs Average Case Assumptions.

Lecture 26 (April 23) Hardness of Learning Continued. Cryptographic Hardness.

Lecture 27 (April 28) Hardness of Learning Assuming Hardness of Refuting Random CSPs.

Lecture 28 (April 30) Additional Topics: Active Learning, Unsupervised Learning.

Course Evaluation

Homework Assignments: There will be 4 homework assignments that will count for 60% of the grade. The assignments which will be proof-based, and are intended to be challenging. Collaboration and discussion among students is allowed, though students must write up their solutions independently.

Course Project: A part of the course (25% of the grade) is an independent project on a topic related to learning theory. Projects can be completed individually or in groups of two students.

The goal of the project is to become an expert in an area related to the class material, and potentially contribute to the state of the art. There are two aspects to the course project. The first is a literature review: Students must decide on a topic and a list of papers, understand these papers in depth, and write a survey presentation of the results in their own words. The second aspect is to identify a research problem/direction on the chosen topic, think about it, and describe the progress they make.

Students must consult with the instructor during the first half of the course for help in forming project teams, selecting a suitable project topic, and selecting a suitable set of research papers.

Students will be graded on the project proposal (5%), the progress report (5%), and the final report (15%).

The remaining part of the grade will be based on class participation (15%).

Readings

The textbook for this course is:

M. Kearns and U. Vazirani. An Introduction to Computational Learning Theory.

An additional textbook (available online) we will use is:

S. Shalev-Shwartz and S. Ben-David. Understanding Machine Learning: From Theory to Algorithms.