In many Data Science applications, non-smooth features arise naturally, e.g., in the analysis of sparsity, low rank structure, or low complexity regularization in general. Constrained optimization problems can be considered as non-smooth objective functions. Envelope functions, dual Lagrangian relaxations, and many other (algorithmically) important functions are naturally non-smooth. Thereby, non-smoothness is clearly not just an artifact, it is a feature that usually comes with a certain structure. The key is to recognize good structures that can be leveraged. Ignoring the non-smoothness leads to inaccurate solutions or slow algorithms.
The course consists of two parts: (1) basic and advanced concepts of non-smooth analysis and (2) optimization algorithms for non-smooth non-convex optimization that are widely used in non-convex optimization for data science (Machine Learning, Computer Vision, Image Processing, and Statistics, ...). The first milestone in Part 1 is the exploration of Fermat's Rule, a first order optimality condition for non-smooth functions. This requires us to introduce generalized derivatives and is a means to formulate commonly employed Lagrange multiplier rules or KKT conditions as a special case. In order to study algorithms in Part 2 to solve non-smooth non-convex optimization problems, we first introduce the class of semi-algebraic and tame functions that comprise all practically relevant functions, but exclude pathological function. This leads to insights into recent breakthroughs in non-smooth and non-convex optimization that yield strong convergence results to a stationary point, based on the so-called Kurdyka-Lojasiewicz inequality. Besides many classical applications, this lecture presents several research topics in optimization for machine learning, for example,
(i) the convergence of SGD, a popular stochastic gradient method, to stationary points,
(ii) the mathematical rigorous formulation of automatic differentiation programs such as implemented in PyTorch,
(iii) and bilevel optimization, which can be seen as a formalization of parameter learning problems.
After taking the course, students know about the most relevant concepts of non-smooth analysis and optimization, beyond convexity. They are able to read and understand related scientific literature. Moreover, they can rate the difficulty of optimization problems arising in applications in machine learning or computer vision and select an efficient algorithm accordingly. Moreover, they are able to perform the convergence analysis for non-convex optimization algorithms and they develop basic skills in solving practical problems with Python.
- DozentIn: Tejas Natu
- DozentIn: Peter Ochs
- DozentIn: Shida Wang