### 2019 Papers

Active strict saddles in nonsmooth optimization.
Damek Davis, Dmitriy Drusvyatskiy.
arxiv abs/1912.07146.
We identify a strict-saddle property for nonsmooth functions that guarantees that typical algorithms (proximal gradient, proximal point, and prox-linear) only converge to local minimizers, when randomly initialized. The new strict saddle property combines negative curvature with existence of an active manifold, and provably holds for generic semi-algebraic problems.

From low probability to high confidence in stochastic convex optimization.
Damek Davis, Dmitriy Drusvyatskiy, Lin Xiao, Junyu Zhang.
arxiv abs/1907.13307.
We develop a generic procedure that augments a wide class of stochastic algorithms with high confidence guarantees at an overhead cost that is only logarithmic in the confidence level and polylogarithmic in the condition number. We discuss consequences for both streaming (online) algorithms and offline algorithms based on empirical risk minimization.

Low-rank matrix recovery with composite optimization : good conditioning and rapid convergence.
Vasileios Charisopoulos, Yudong Chen, Damek Davis, Mateo Díaz, Lijun Ding, Dmitriy Drusvyatskiy.
arxiv abs/1904.10020.
We show that for a variety of low-rank matrix recovery tasks, nonsmooth problem formulations are automatically well-conditioned, thereby endowing standard numerical methods with optimal sample and computation efficiency guarantees.

Complexity of Highly Parallel Non-Smooth Convex Optimization.
Sébastien Bubeck, Qijia Jiang, Yin Tat Lee, Yuanzhi Li, Aaron Sidford.
arxiv abs/1906.10655.
We give a parallel optimization algorithm for non-smooth convex functions and show that this has the optimal number of rounds. Our lower bound improves upon a decades-old construction by Nemirovski.

The Randomized Midpoint Method for Log-Concave Sampling.
Ruoqi Shen, Yin Tat Lee.
arxiv abs/1909.05503.
We propose a new framework to discretize stochastic differential equations and give a much faster sampling algorithm for well-conditioned log-concave distributions.

Online Continuous DR-Submodular Maximization with Long-Term Budget Constraints.
arxiv abs/1907.00316.
We consider the online continuous DR-submodular maximization problem with long-term budget constraints where both the DR-submodular utility functions and the linear constraint functions are arriving online and we propose an algorithm that obtains sub-linear regret and total budget violation bounds for this problem.

Escaping from saddle points on Riemannian manifolds.
Yue Sun, Nicolas Flammarion, Maryam Fazel.
arXiv abs/1906.07355.
We show that perturbed Riemannian gradient descent on Riemannian manifolds escapes from saddle points and converges to an approximate local minimum.

Uniform graphical convergence of subgradients in nonconvex optimization and learning.
Damek Davis, Dmitriy Drusvyatskiy.
arXiv abs/1810.07590.
We establish finite-sample bounds on uniform subgradient estimation of weakly convex functions through sampling.

LQR through the Lens of First Order Methods: Discrete-time Case.
Jingjing Bu, Afshin Mesbahi, Maryam Fazel, Mehran Mesbahi.
arxiv abs/1907.08921.
We consider the Linear Quadratic Regulator (LQR) problem, a central problem in control theory, as optimizing a nonconvex cost function over control policies and charaterize its analytical properties (smoothness, coerciveness, quadratic growth) which help obtain stepsize criteria for gradient descent and natural gradient descent for linear convergence to global optima. An optimal stepsize for the quasi-Newton iteration is also proposed that recovers Hewer's algorithm.

### 2018 Papers

Width-Independence Beyond Linear Objectives: Distributed Fair Packing and Covering Algorithms.
Jelena Diakonikolas, Maryam Fazel, Lorenzo Orecchia.
arxiv abs/1808.02517.
Fair allocation of resources has been broadly studied in political science, economic theory, operations research, and networking. We consider general α-fair resource allocation problems, i.e., maximizing an α-fair utility function under packing constraints, and give an improved distributed, width-independent algorithm, whose running time depends only polylogarithmically on the size of entries in the constraint matrix.

Stochastic model-based minimization of weakly convex functions.
Damek Davis, Dmitriy Drusvyatskiy.
arXiv abs/1803.06523.
This work proves a convergence rate of $$O(k^{-1/4})$$ for a wide class of algorithms that are based on sampling one sided models of the function. Primary examples are the stochastic subgradient method and the stochastic prox-linear algorithm for minimizing a composition of a convex function with a smooth map.

Stochastic subgradient method converges on tame functions.
Damek Davis, Dmitriy Drusvyatskiy, Sham Kakade, Jason D. Lee.
arxiv abs/1804.07795.
We prove that the stochastic subgradient method, on any Whitney stratifiable'' locally Lipschitz function, produces limit points that are all first-order stationary. The class of Whitney stratifiable functions is virtually exhaustive in data science, and in particular, includes all popular deep learning architectures.

Global Convergence of Policy Gradient Methods for Linearized Control Problems.
Maryam Fazel, Rong Ge, Sham M. Kakade, Mehran Mesbahi.
arXiv abs/1801.05039.
This work shows that (model free) policy gradient methods globally converge to the optimal solution and are efficient with regards to their sample and computational complexities.

Stochastic subgradient method converges at the rate $$O(k^{−1/4})$$ on weakly convex functions.
Damek Davis, Dmitriy Drusvyatskiy.
arXiv abs/1802.02988.
This work proves that the proximal stochastic subgradient method converges at a rate $$O(k^{-1/4})$$ on weakly convex problems. In particular, it resolves the long-standing open question on the rate of convergence of the proximal stochastic gradient method (without batching) for minimizing a sum of a smooth function and a proximable convex function.

Competitive Online Algorithms for Resource Allocation over the Positive Semidefinite Cone.
Reza Eghbali, James Saunderson, Maryam Fazel.
arXiv abs/1802.01312.
Competitive Online Algorithms for Resource Allocation over the Positive Semidefinite Cone

An optimal first order method based on optimal quadratic averaging.
Dmitriy Drusvyatskiy, Maryam Fazel, Scott Roy.
arXiv abs/1604.06543.
An optimal first order method based on optimal quadratic averaging

Variational Gram Functions- Convex Analysis and Optimization.
Amin Jalali, Maryam Fazel, Lin Xiao.
arXiv abs/1507.04734.
This paper considers a new class of convex penalty functions, called variational Gram functions (VGFs), that can promote pairwise relations, such as orthogonality, among a set of vectors in a vector space.

### 2017 Papers

The Many Faces of Degeneracy in Conic Optimization.
Dmitriy Drusvyatskiy, Henry Wolkowicz.
Foundations and Trends in Optimization, Vol. 3, No. 2, pp 77-170, 2017..
This work surveys the roles of Slater's condition and the facial reduction technique in conic optimization.

The proximal point method revisited.
Dmitriy Drusvyatskiy.
Survey submitted to SIAG/OPT Views and News.
This short survey revisits the role of the proximal point method in large scale optimization, focusing on three recent examples: a proximally guided stochastic subgradient method, the prox-linear algorithm, and Catalyst generic acceleration.

Catalyst Acceleration for First-order Convex Optimization: from Theory to Practice.
Hongzhou Lin, Julien Mairal, Zaid Harchaoui.
arXiv abs/1712.05654.
The paper presents the comprehensive theory and practice of the Catalyst acceleration scheme for accelerating gradient-based convex optimization methods.
Catalyst applies to a large class of optimization algorithms, including gradient descent, block coordinate descent, incremental algorithms such as SAG, SAGA, SDCA, SVRG, Finito/MISO, and their proximal variants.

Invariances and Data Augmentation for Supervised Music Transcription.
John Thickstun, Zaid Harchaoui, Dean Foster, Sham M. Kakade.
arXiv abs/1711.04845.
This paper presents the winning entry to the 2017 MIREX Multiple Fundamental Frequency Estimation evaluation. The method is based on a translation-invariant convolutional architecture.

The nonsmooth landscape of phase retrieval.
Damek Davis, Dmitriy Drusvyatskiy, Courtney Paquette.
arXiv abs/1711.03247.
This paper shows that the standard Polyak subgradient method converges linearly for the phase retrieval problem, when initialized within a constant relative distance of the underlying signal. The second part of the paper analyzes concentration of stationary points of the subsampled objective.

An homotopy method for $$\ell_p$$ regression provably beyond self-concordance and in input-sparsity time.
Sébastien Bubeck, Michael B. Cohen, Yin Tat Lee, Yuanzhi Li.
arXiv abs/1711.01328.
Solving two long open problems about $\ell_p$ ball:
1. The existence of input sparsity time algorithm for lp regression.
2. The inexistence of good self-concordance barrier.

k-server via multiscale entropic regularization.
Sébastien Bubeck, Michael B. Cohen, James R. Lee, Yin Tat Lee, Aleksander Madry.
arXiv abs/1711.01085. [Blog]
A breakthough in the k-server problem. The first o(k)-competitive algorithm for hierarchically separated trees.

Convergence Rate of Riemannian Hamiltonian Monte Carlo and Faster Polytope Volume Computation.
Yin Tat Lee, Santosh S. Vempala.
arXiv abs/1710.06261.
A quadratic improvement of polytope volume computation!

The Paulsen Problem, Continuous Operator Scaling, and Smoothed Analysis.
Tsz Chiu Kwok, Lap Chi Lau, Yin Tat Lee, Akshay Ramachandran.
arXiv abs/1710.02587.
Resolving the Paulsen problem - a basic open problem in frame theory with applications from signal processing, communication theory and theoretical computer science.

Catalyst Acceleration for Gradient-Based Non-Convex Optimization.
Hongzhou Lin, Julien Mairal, Zaid Harchaoui.
arXiv abs/1703.10993.
The paper presents the extension of the Catalyst acceleration scheme for gradient non-convex optimization. When the objective is convex, 4WD-Catalyst enjoys the same properties as the regular Catalyst. When the objective is nonconvex, it achieves the best known convergence rate to stationary points for first-order methods.