Jerry Li

Microsoft Building 99
Redmond, WA 98052

jerrl AT microsoft DOT com

My CV (last updated 2/19/2018)

About

I am a research scientist in the Machine Learning and Optimization Group at Microsoft Research Redmond.

In Fall 2018 I was the VMware Research Fellow at the Simons Institute. I did my Ph.D at MIT, where I was fortunate to work with Ankur Moitra. I also did my masters at MIT under the wonderful supervision of Nir Shavit.

My primary research interests are in learning theory and distributed algorithms, but I am broadly interested in many other things in TCS. I particularly like applications of analysis and analytic techniques to TCS problems.

As an undergrad at the University of Washington, I worked on complexity of branching programs, and how we could prove hardness of techniques used for naturally arising learning problems in database theory and AI.

In my free time I enjoy being remarkably mediocre at ultimate frisbee, chess, and piano, amongst other things.

Papers

Authors are ordered alphabetically unless obviously not (or marked).

Theses

Preprints

Conference and Workshop Papers

  1. SEVER: A Robust Meta-Algorithm for Stochastic Optimization
    Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Jacob Steinhardt, Alistair Stewart
    preliminary version in SecML 2018, Oral Presentation
    ICML 2019

  2. How Hard is Robust Mean Estimation?
    Samuel B. Hopkins, Jerry Li
    COLT 2019

  3. On Mean Estimation For General Norms with Statistical Queries
    Jerry Li, Aleksandar Nikolov, Ilya Razenshteyn, Erik Waingarten
    COLT 2019

  4. Privately Learning High-Dimensional Distributions
    Gautam Kamath, Jerry Li, Vikrant Singhal, Jonathan Ullman
    preliminary version in TPDP 2018
    COLT 2019

  5. Spectral Signatures for Backdoor Attacks
    Brandon Tran, Jerry Li, Aleksander Mądry
    NeurIPS 2018

  6. Byzantine Stochastic Gradient Descent
    Dan Alistarh, Zeyuan Allen-Zhu, Jerry Li
    NeurIPS 2018

  7. On the limitations of first order approximation in GAN dynamics
    Jerry Li, Aleksander Mądry, John Peebles, Ludwig Schmidt
    preliminary version in PADL 2017 as Towards Understanding the Dynamics of Generative Adversarial Networks
    ICML 2018

  8. Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms
    Ilias Diakonikolas, Jerry Li, Ludwig Schmidt
    COLT 2018

  9. Distributionally Linearizable Data Structures
    Dan Alistarh, Trevor Brown, Justin Kopinsky, Jerry Li, Giorgi Nadiradze
    SPAA 2018

  10. Mixture Models, Robustness, and Sum of Squares Proofs
    Samuel B. Hopkins, Jerry Li
    STOC 2018

  11. Robustly Learning a Gaussian: Getting Optimal Error, Efficiently
    Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Ankur Moitra, Alistair Stewart
    SODA 2018

  12. Communication-Efficient Distributed Learning of Discrete Distributions
    Ilias Diakonikolas, Elena Grigorescu, Jerry Li, Abhiram Natarajan, Krzysztof Onak, Ludwig Schmidt
    NIPS 2017, Oral Presentation

  13. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks
    Dan Alistarh, Demjan Grubić, Jerry Li, Ryota Tomioka, Milan Vojnovic
    preliminary version in OPT 2016
    NIPS 2017, Spotlight Presentation
    Invited for presentation at NVIDIA GTC
    [code][poster][video]

  14. Being Robust (in High Dimensions) can be Practical
    Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Ankur Moitra, Alistair Stewart
    ICML 2017
    [code]

  15. ZipML: An End-to-end Bitwise Framework for Dense Generalized Linear Models
    Hantian Zhang*, Jerry Li*, Kaan Kara, Dan Alistarh, Ji Liu, Ce Zhang
    *equal contribution
    ICML 2017

  16. The Power of Choice in Priority Scheduling
    Dan Alistarh, Justin Kopinsky, Jerry Li, Giorgi Nadiradze
    PODC 2017

  17. Robust Sparse Estimation Tasks in High Dimensions
    Jerry Li
    COLT 2017
    merged with this paper

  18. Robust Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities
    Jerry Li, Ludwig Schmidt.
    COLT 2017

  19. Sample Optimal Density Estimation in Nearly-Linear Time
    Jayadev Acharya, Ilias Diakonikolas, Jerry Li, Ludwig Schmidt.
    SODA 2017
    TCS+ talk by Ilias, which discussed the piecewise polynomial framework and our results at a high level

  20. Robust Estimators in High Dimensions, without the Computational Intractability
    Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Ankur Moitra, Alistair Stewart
    FOCS 2016
    Invited to Highlights of Algorithms 2017
    Invited to appear in special issue of SIAM Journal on Computing for FOCS 2016.
    MIT News, USC Viterbi News

  21. Fast Algorithms for Segmented Regression
    Jayadev Acharya, Ilias Diakonikolas, Jerry Li, Ludwig Schmidt
    ICML 2016

  22. Replacing Mark Bits with Randomness in Fibonacci Heaps
    Jerry Li, John Peebles.
    ICALP 2015

  23. Fast and Near-Optimal Algorithms for Approximating Distributions by Histograms
    Jayadev Acharya, Ilias Diakonikolas, Chinmay Hegde, Jerry Li, Ludwig Schmidt.
    PODS 2015

  24. The SprayList: A Scalable Relaxed Priority Queue
    Dan Alistarh, Justin Kopinsky, Jerry Li, Nir Shavit.
    PPoPP 2015, Best Artifact Award
    See also the full version
    [code]
    Slashdot, MIT News

  25. On the Importance of Registers for Computability
    Rati Gelashvili, Mohsen Ghaffari, Jerry Li, Nir Shavit.
    OPODIS 2014

  26. The following two papers are subsumed by the journal paper Exact Model Counting of Query Expressions: Limitations of Propositional Methods below.
  27. Model Counting of Query Expressions: Limitations of Propositional Methods
    Paul Beame, Jerry Li, Sudeepa Roy, Dan Suciu.
    ICDT 2014
    Invited to appear in special issue of ACM Transactions on Database Systems for ICDT 2014.

  28. Lower bounds for exact model counting and applications in probabilistic databases
    Paul Beame, Jerry Li, Sudeepa Roy, and Dan Suciu.
    UAI 2013, selected for plenary presentation.

Journal Papers

  • Robust Estimators in High Dimensions without the Computational Intractability
    Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, Alistair Stewart.
    SIAM Journal on Computing, 48(2), 2019. Special Issue for FOCS 2016.

  • Exact Model Counting of Query Expressions: Limitations of Propositional Methods
    Paul Beame, Jerry Li, Sudeepa Roy, Dan Suciu.
    ACM Transactions on Database Systems (TODS), Vol. 42, Issue 1, pages 1:1-1:46, March 2017.

Patents

  • Efficient training of neural networks
    Dan Alistarh, Jerry Li, Ryota Tomioka, Milan Vojnovic
    in submission

Other Writing

Talks

  • The Sample Complexity of Toeplitz Covariance Estimation

    • MIT Algorithms and Complexity Seminar, May 2019

  • Efficient Algorithms for High Dimensional Robust Learning

    • MSR AI Seminar, April 2019

  • Nearly Optimal Algorithms for Robust Mean Estimation

    • MIT Algorithms and Complexity Seminar, February 2019

    • TTIC Machine Learning Seminar, February 2019

    • MSR MLO Lunch, January 2019

    • UW Theory Seminar, January 2019

  • "Explicitly" Learning Mixtures of Gaussians

    • Simons Fellows Reading Group, September 2018

  • Robustly Learning a Gaussian in High Dimensions: Getting Optimal Error, Efficiently

    • SODA 2018, January 2018

  • Mixture Models, Robustness, and Sum-of-Squares Proofs

    • Google Algorithms Reading Group, July 2018

    • Microsoft Research Redmond, December 2017

    • MIT Algorithms and Complexity Semniar, November 2017

  • QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding

    • NIPS 2017, December 2017

  • Being Robust (in High Dimensions) can be Practical

    • ICML 2017, August 2017

  • Robust Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities

    • COLT 2017, July 2017

  • Efficient Robust Sparse Estimation in High Dimensions

    • COLT 2017, July 2017. Joint with Simon Du

  • Robust Estimators In High Dimensions without the Computational Intractability [slides]

    • TCS+, December 2016 [video]

    • FOCS, October 2016 [video]

    • ETH Theory Seminar, August 2016

    • UW Theory Lunch, July 2016

    • MIT Algorithms and Complexity Seminar, June 2016

  • Quantized Stochastic Gradient Descent

    • MIT ML Tea, October 2016

  • Fast Algorithms for Segmented Regression [slides]

  • Fast and Near-Optimal Algorithms for Approximating Distributions by Histograms [slides]

    • PODS 2015

  • Model Counting of Query Expressions: Limitations of Propositional Methods [slides]

    • ICDT 2015

    • MIT Theory Lunch, 2014

Teaching

At MIT
  1. TA for 6.852, Distributed Algorithms, Fall 2014.

At UW
  1. TA for the UW Math REU under Dr. James Morrow, Summer 2013.

  2. TA for MATH 334/5/6, Advanced Accelerated Second Year Honors Calculus, 2012-2013.

  3. TA for CS 373, Algorithms and Data Structures, Spring 2012.

  4. TA for CS 344, Databases, Winter 2012.

Misc

* This might be false