Jerry Li

Microsoft Building 99
Redmond, WA 98052

jerrl AT microsoft DOT com

My CV (last updated 11/08/2023)

About

I am a principal research scientist in the PhysAGI (formerly Machine Learning Foundations) Group at Microsoft Research Redmond.

In Fall 2018 I was the VMware Research Fellow at the Simons Institute. I did my Ph.D at MIT, where I was fortunate to work with Ankur Moitra. I also did my masters at MIT under the wonderful supervision of Nir Shavit.

My primary research interests are in learning theory, (very) broadly defined, including quantum information theory, the science of large foundation models, and high-dimensional statistics. I particularly like applications of analysis and analytic techniques to TCS problems.

As an undergrad at the University of Washington, I worked on complexity of branching programs, and how we could prove hardness of techniques used for naturally arising learning problems in database theory and AI.

Teaching

I taught a course on robust machine learning at UW in Fall 2019! See the course website for more details. I also made some video lectures, covering and expanding upon some of the material covered in that course.


I am fortunate to have supervised the following amazing junior researchers:
  • Jaume de Dios Pont (Research Intern, Summer 2023), co-advised with Adil Salim
  • Jane Lange (Research Intern, Summer 2023)
  • Sidhanth Mohanty (Research Intern, Summer 2022)
  • Allen Liu (Research Intern, Summer 2021, Summer 2022)
  • Huiying Li (Research Intern, Summer 2020), co-advised with Ece Kamar and Emre Kıcıman.
  • Kai Xiao (Research Intern, Summer 2020), co-advised with Sébastien Bubeck.
  • Kevin Tian (Research Intern, Summer 2020).
  • Ivan Evtimov (Research Intern, Spring 2020), co-advised with Weidong Cui, Ece Kamar, and Emre Kıcıman.
  • Tony Duan (MSR AI Resident, 2019–2020).
  • Hadi Salman (MSR AI Resident, 2018–2019).
  • Sitan Chen (Research Intern, Summer 2019)

Papers

Authors are ordered alphabetically unless they're not

Theses

Essays

  • Robustness Meets Algorithms
    Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, Alistair Stewart.
    Communications of the ACM May 2021, Research Highlights
    Technical Perspective: Jacob Steinhardt

Preprints

Journal Papers

  • The Complexity of NISQ
    Sitan Chen, Jordan Cotler, Hsin-Yuan Huang, Jerry Li
    to appear, Nature Communications
    preliminary version in QIP 2023

  • Quantum Advantage in Learning from Experiments
    Hsin-Yuan Huang, Michael Broughton, Jordan Cotler, Sitan Chen, Jerry Li, Masoud Mohseni, Hartmut Neven, Ryan Babbush, Richard Kueng, John Preskill, Jarrod R. McClean.
    Science, 376 (6598), 2022.

  • Robust Estimators in High Dimensions without the Computational Intractability
    Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, Alistair Stewart.
    SIAM Journal on Computing, 48(2), 2019. Special Issue for FOCS 2016.

  • Exact Model Counting of Query Expressions: Limitations of Propositional Methods
    Paul Beame, Jerry Li, Sudeepa Roy, Dan Suciu.
    ACM Transactions on Database Systems (TODS), Vol. 42, Issue 1, pages 1:1-1:46, March 2017.

Conference and Workshop Papers

  1. KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval
    Marah I Abdin, Suriya Gunasekar, Varun Chandrasekaran, Jerry Li, Mert Yuksekgonul, Rahee Ghosh Peshawaria, Ranjita Naik, Besmira Nushi
    to appear, ICLR 2024

  2. Automatic Prompt Optimization with "Gradient Descent" and Beam Search
    Reid Pryzant, Dan Iter, Jerry Li, Yin Tat Lee, Chenguang Zhu, Michael Zeng
    EMNLP 2023

  3. Structured Semidefinite Programming for Recovering Structured Preconditioners
    Arun Jambulapati, Jerry Li, Christopher Musco, Kirankumar Shiragur, Aaron Sidford, Kevin Tian
    NeurIPS 2023
    preliminary version in OPT 2022

  4. The Full Landscape of Robust Mean Testing: Sharp Separations between Oblivious and Adaptive Contamination
    Clément L. Canonne, Samuel B. Hopkins, Jerry Li, Allen Liu, Shyam Narayanan
    FOCS 2023
    Invited to appear in special issue of SIAM Journal on Computing for FOCS 2023

  5. Matrix Completion in Almost-Verification Time
    Jon Kelner, Jerry Li, Allen Liu, Aaron Sidford, Kevin Tian
    FOCS 2023

  6. When Does Adaptivity Help for Quantum State Learning?
    Sitan Chen, Brice Huang, Jerry Li, Allen Liu, Mark Sellke
    FOCS 2023
    preliminary version in QIP 2023, merged with this paper

  7. Query lower bounds for log-concave sampling
    Sinho Chewi, Jaume de Dios Pont, Jerry Li, Chen Lu, Shyam Narayanan
    FOCS 2023

  8. Semi-Random Sparse Recovery in Nearly-Linear Time
    Jon Kelner, Jerry Li, Allen Liu, Aaron Sidford, Kevin Tian
    COLT 2023

  9. Sampling Is as Easy as Learning the Score: Theory for Diffusion Models With Minimal Data Assumptions
    Sitan Chen, Sinho Chewi, Jerry Li, Yuanzhi Li, Adil Salim, Anru R. Zhang
    ICLR 2023, Notable top 5%

  10. Learning Polynomial Transformations
    Sitan Chen, Jerry Li, Yuanzhi Li, Anru R. Zhang
    STOC 2023

  11. REAP: A Large-Scale Realistic Adversarial Patch Benchmark
    Nabeel Hingun, Chawin Sitawarin, Jerry Li, David Wagner
    ICCV 2023

  12. Learning (Very) Simple Generative Models Is Hard
    Sitan Chen, Jerry Li, Yuanzhi Li
    NeurIPS 2022

  13. Robust Model Selection and Nearly-Proper Learning for GMMs
    Jerry Li, Allen Liu, Ankur Moitra
    NeurIPS 2022

  14. Tight Bounds for Quantum State Certification with Incoherent Measurements
    Sitan Chen, Brice Huang, Jerry Li, Allen Liu
    FOCS 2022
    QIP 2023, merged with this paper

  15. The Price of Tolerance in Distribution Testing
    Clément Canonne, Gautam Kamath, Ayush Jain, Jerry Li
    COLT 2022

  16. Clustering Mixtures with Almost Optimal Separation in Polynomial Time
    Jerry Li, Allen Liu
    STOC 2022
    Invited to appear in special issue of SIAM Journal on Computing for STOC 2022

  17. Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation
    Ilias Diakonikolas, Daniel M. Kane, Daniel Kongsgaard, Jerry Li, Kevin Tian
    STOC 2022

  18. Minimax Optimality (Probably) Doesn't Imply Distribution Learning for GANs
    Sitan Chen, Jerry Li, Yuanzhi Li, Raghu Meka
    ICLR 2022

  19. Toward Instance-Optimal State Certification With Incoherent Measurements
    Sitan Chen, Jerry Li, Ryan O'Donnell
    preliminary version in QIP 2022
    COLT 2022

  20. Robust Regression Revisited: Acceleration and Improved Estimation Rates
    Arun Jambulapati, Jerry Li, Tselil Schramm, Kevin Tian
    NeurIPS 2021

  21. List-Decodable Mean Estimation in Nearly-PCA Time
    Ilias Diakonikolas, Daniel M. Kane, Daniel Kongsgaard, Jerry Li, Kevin Tian
    NeurIPS 2021, Spotlight Presentation

  22. A Hierarchy for Replica Quantum Advantage
    Sitan Chen, Jordan Cotler, Hsin-Yuan Huang, Jerry Li
    QIP 2022, merged with [CCHL21]

  23. Exponential Separations between Learning With and Without Quantum Memory
    Sitan Chen, Jordan Cotler, Hsin-Yuan Huang, Jerry Li
    FOCS 2021
    QIP 2022
    Invited to appear in special issue of SIAM Journal on Computing for FOCS 2021

  24. Finding the Mode of a Kernel Density Estimate
    Jasper C.H. Lee, Jerry Li, Christopher Musco, Jeff M. Phillips, Wai Ming Tai
    ESA 2021

  25. Statistical Query Algorithms and Low-Degree Tests Are Almost Equivalent
    Matthew Brennan, Guy Bresler, Samuel B. Hopkins, Jerry Li, Tselil Schramm
    COLT 2021, Best Paper Runner Up

  26. Aligning AI With Shared Human Values
    Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt
    ICLR 2021

  27. Byzantine-Resilient Non-Convex Stochastic Gradient Descent
    Dan Alistarh, Zeyuan Allen-Zhu, Faeze Ebrahimianghazani, Jerry Li
    ICLR 2021

  28. Robust and Heavy-Tailed Mean Estimation Made Simple, via Regret Minimization
    Samuel B. Hopkins, Jerry Li, Fred Zhang
    NeurIPS 2020

  29. Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing
    Arun Jambulapati, Jerry Li, Kevin Tian
    NeurIPS 2020, Spotlight Presentation

  30. Learning Structured Distributions From Untrusted Batches: Faster and Simpler
    Sitan Chen, Jerry Li, Ankur Moitra
    NeurIPS 2020

  31. Robust Covariance Estimation in Nearly-Matrix Multiplication Time
    Jerry Li, Guanghao Ye
    NeurIPS 2020

  32. Entanglement is Necessary for Optimal Quantum Property Testing
    Sébastien Bubeck, Sitan Chen, Jerry Li
    FOCS 2020

  33. Randomized Smoothing of All Shapes and Sizes
    Greg Yang, Tony Duan, Edward Hu, Hadi Salman, Ilya Razenshteyn, Jerry Li
    ICML 2020

  34. Positive Semidefinite Programming: Mixed, Parallel, and Width-Independent
    Arun Jambulapati, Yin Tat Lee, Jerry Li, Swati Padmanabhan, Kevin Tian
    STOC 2020

  35. Learning Mixtures of Linear Regressions in Subexponential Time via Fourier Moments
    Sitan Chen, Jerry Li, Zhao Song
    STOC 2020

  36. Efficiently Learning Structured Distributions from Untrusted Batches
    Sitan Chen, Jerry Li, Ankur Moitra
    STOC 2020

  37. Low-rank Toeplitz Matrix Estimation via Random Ultra-Sparse Rulers
    Hannah Lawrence, Jerry Li, Cameron Musco, Christopher Musco
    ICASSP 2020

  38. The Sample Complexity of Toeplitz Covariance Estimation
    Yonina Eldar, Jerry Li, Cameron Musco, Christopher Musco
    SODA 2020

  39. Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers
    Hadi Salman, Greg Yang, Jerry Li, Pengchuan Zhang, Huan Zhang, Ilya Razenshteyn, Sébastien Bubeck
    NeurIPS 2019, Spotlight Presentation

  40. Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection
    Yihe Dong, Samuel B. Hopkins, Jerry Li
    NeurIPS 2019, Spotlight Presentation

  41. SEVER: A Robust Meta-Algorithm for Stochastic Optimization
    Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Jacob Steinhardt, Alistair Stewart
    preliminary version in SecML 2018, Oral Presentation
    ICML 2019

  42. How Hard is Robust Mean Estimation?
    Samuel B. Hopkins, Jerry Li
    COLT 2019

  43. On Mean Estimation For General Norms with Statistical Queries
    Jerry Li, Aleksandar Nikolov, Ilya Razenshteyn, Erik Waingarten
    COLT 2019

  44. Privately Learning High-Dimensional Distributions
    Gautam Kamath, Jerry Li, Vikrant Singhal, Jonathan Ullman
    preliminary version in TPDP 2018
    COLT 2019

  45. Spectral Signatures for Backdoor Attacks
    Brandon Tran, Jerry Li, Aleksander Mądry
    NeurIPS 2018

  46. Byzantine Stochastic Gradient Descent
    Dan Alistarh, Zeyuan Allen-Zhu, Jerry Li
    NeurIPS 2018

  47. On the limitations of first order approximation in GAN dynamics
    Jerry Li, Aleksander Mądry, John Peebles, Ludwig Schmidt
    preliminary version in PADL 2017 as Towards Understanding the Dynamics of Generative Adversarial Networks
    ICML 2018

  48. Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms
    Ilias Diakonikolas, Jerry Li, Ludwig Schmidt
    COLT 2018

  49. Distributionally Linearizable Data Structures
    Dan Alistarh, Trevor Brown, Justin Kopinsky, Jerry Li, Giorgi Nadiradze
    SPAA 2018

  50. Mixture Models, Robustness, and Sum of Squares Proofs
    Samuel B. Hopkins, Jerry Li
    STOC 2018

  51. Robustly Learning a Gaussian: Getting Optimal Error, Efficiently
    Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Ankur Moitra, Alistair Stewart
    SODA 2018

  52. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks
    Dan Alistarh, Demjan Grubić, Jerry Li, Ryota Tomioka, Milan Vojnovic
    preliminary version in OPT 2016
    NIPS 2017, Spotlight Presentation
    Invited for presentation at NVIDIA GTC
    [code][poster][video]

  53. Being Robust (in High Dimensions) can be Practical
    Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Ankur Moitra, Alistair Stewart
    ICML 2017
    [code]

  54. ZipML: An End-to-end Bitwise Framework for Dense Generalized Linear Models
    Hantian Zhang*, Jerry Li*, Kaan Kara, Dan Alistarh, Ji Liu, Ce Zhang
    *equal contribution
    ICML 2017

  55. The Power of Choice in Priority Scheduling
    Dan Alistarh, Justin Kopinsky, Jerry Li, Giorgi Nadiradze
    PODC 2017

  56. Robust Sparse Estimation Tasks in High Dimensions
    Jerry Li
    COLT 2017
    merged with this paper

  57. Robust Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities
    Jerry Li, Ludwig Schmidt.
    COLT 2017

  58. Sample Optimal Density Estimation in Nearly-Linear Time
    Jayadev Acharya, Ilias Diakonikolas, Jerry Li, Ludwig Schmidt.
    SODA 2017
    TCS+ talk by Ilias, which discussed the piecewise polynomial framework and our results at a high level

  59. Robust Estimators in High Dimensions, without the Computational Intractability
    Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Ankur Moitra, Alistair Stewart
    FOCS 2016
    Invited to Highlights of Algorithms 2017
    Invited to appear in special issue of SIAM Journal on Computing for FOCS 2016
    Invited to appear in Communications of the ACM Research Highlights
    MIT News, USC Viterbi News

  60. Fast Algorithms for Segmented Regression
    Jayadev Acharya, Ilias Diakonikolas, Jerry Li, Ludwig Schmidt
    ICML 2016

  61. Replacing Mark Bits with Randomness in Fibonacci Heaps
    Jerry Li, John Peebles.
    ICALP 2015

  62. Fast and Near-Optimal Algorithms for Approximating Distributions by Histograms
    Jayadev Acharya, Ilias Diakonikolas, Chinmay Hegde, Jerry Li, Ludwig Schmidt.
    PODS 2015

  63. The SprayList: A Scalable Relaxed Priority Queue
    Dan Alistarh, Justin Kopinsky, Jerry Li, Nir Shavit.
    PPoPP 2015, Best Artifact Award
    See also the full version
    [code]
    Slashdot, MIT News

  64. On the Importance of Registers for Computability
    Rati Gelashvili, Mohsen Ghaffari, Jerry Li, Nir Shavit.
    OPODIS 2014

  65. The following two papers are subsumed by the journal paper Exact Model Counting of Query Expressions: Limitations of Propositional Methods
  66. Model Counting of Query Expressions: Limitations of Propositional Methods
    Paul Beame, Jerry Li, Sudeepa Roy, Dan Suciu.
    ICDT 2014
    Invited to appear in special issue of ACM Transactions on Database Systems for ICDT 2014.

  67. Lower bounds for exact model counting and applications in probabilistic databases
    Paul Beame, Jerry Li, Sudeepa Roy, and Dan Suciu.
    UAI 2013, selected for plenary presentation.

Patents

  • Efficient training of neural networks
    Dan Alistarh, Jerry Li, Ryota Tomioka, Milan Vojnovic

Other Writing

Misc

  • I reached Challenger in Set 4.5 TFT. This is by far my proudest life accomplishment.

  • I am on the steering committee for SLOGN*

  • I organized the Great Ideas in Theoretical Computer Science (aka theory lunch) in the 2013-2014 academic year.

  • I stole the boombox from the Glorious Office 3 times, then promptly lost it back each time.

* This might be false