Robustness in Machine Learning

Robustness in Machine Learning (CSE 599-M)

Instructor: Jerry Li
TA: Haotian Jiang
Time: Tuesday, Thursday 10:00—11:30 AM
Room: Gates G04
Office hours: by appointment, CSE 452

Course description

As machine learning is applied to increasingly sensitive tasks, and applied on noisier and noisier data, it has become important that the algorithms we develop for ML are robust to potentially worst-case noise. In this class, we will survey a number of recent developments in the study of robust machine learning, from both a theoretical and empirical perspective. Tentatively, we will cover a number of related topics, both theoretical and applied, including:

Learning in the presence of outliers. Techniques for learning when our training dataset is corrupted by worst-case noise. This includes robust statistics, list learning, and watermarking and data poisoning attacks.

Adversarial examples. Famously, neural network image classifiers can be fooled at test time by perturbing a test image by an imperceptible amount. We will discuss how such attacks work, empirical defenses for these attacks (e.g. PGD), and certifiable defenses which yield provable robustness.

Private machine learning. How can we develop algorithms for ML that respect the privacy of the users providing the data?

Our goal (though we will often fall short of this task) is to devise theoretically sound algorithms for these tasks which transfer well to practice.

The intended audience for this class is CS graduate students in Theoretical Computer Science and/or Machine Learning, who are interested in doing research in this area. However, interested undergraduates and students from other departments are welcome to attend as well. The coursework will be light and consist of some short problem sets as well as a final project.

For non-CSE students/undergraduates: If you are interested in this class, please attend the first lecture. If the material suits your interests and background, please request an add code from me afterwards.

Prerequisites

We will assume mathematical maturity and comfort with algorithms, probability, and linear algebra. Background in machine learning will be helpful but should not be necessary.

Project Proposals

Please turn in (or email) a one page project proposal by November 12th. Projects can be reading projects, where you survey the literature on some area that we didn't cover, or research projects, where you try (but not necessarily succeed at) tackling an open problem in the area. Projects can be either theoretical or applied. If you're short of ideas, please feel free to ask the instructor!

Homework

Homework problems will be added as we cover the appropriate material. Each homework will be due one to two weeks after we finish the corresponding unit.

Homework 1. Due date: ~~Nov. 5~~ Nov. 12 pdf
Homework 2. Due date: Dec. 6 pdf

Lectures

Lecture notes are work in progress. Feedback is welcome!

Lecture 0: Syllabus / administrative stuff (slightly outdated). notes
Lecture 1 (9/26): Introduction to robustness. notes
Lecture 2 (10/1): Total variation, statistical models, and lower bounds. notes
Lecture 3 (10/3): Robust mean estimation in high dimensions. notes
Lecture 4 (10/8): Spectral signatures and efficient certifiability. notes
Lecture 5 (10/10): Efficient filtering from spectral signatures. notes
Lecture 6 (10/15): Stronger spectral signatures for Gaussian datasets. notes
Lecture 7 (10/17): Efficient filtering from spectral signatures for Gaussian data. notes
Lecture 8 (10/22): Additional topics in robust statistics. notes
Lecture 9 (10/24): Introduction to adversarial examples. notes
Lecture 10 (10/29): Empirical defenses for adversarial examples. notes
Lecture 11 (10/31): The four worlds hypothesis: models for adversarial examples. notes
NO CLASS (11/05) to recover from the STOC deadline
Lecture 12 (11/07): Certified defenses I: Exact certification. notes
Lecture 13 (11/12): Certified defenses II: Convex relaxations. notes
Lecture 14 (11/14): Certified defenses III: Randomized smoothing. notes
Lecture 15 (11/19): Additional topics in robust deep learning. notes
Lecture 16 (11/21): Basics of differential privacy. notes
Lecture 17 (11/26): Differentially private estimation I: univariate mean estimation. notes
NO CLASS 11/28: Thanksgiving
Lecture 18 (12/3): (Guest lecture by Sivakanth Gopi) Differentially private estimation II: high dimensional estimation. notes
Lecture 19 (12/5): Additional topics in private machine learning. notes

Supplementary material

Principled Approaches to Robust Machine Learning and Beyond. My Ph.D thesis.
Robust Learning: Information Theory and Algorithms Jacob Steinhardt's Ph.D thesis.
Jacob is also teaching a similar class at Berkeley this semester: link

Accommodations

Please refer to university policies regarding disability accommodations or religious accommodations.