“In fact the great watershed in optimization isn’t between linearity and nonlinearity, but convexity and nonconvexity.”
Introduction and basic concepts
Convex analysis is a beautiful mathematical theory with many applications in science and engineering. Its most obvious significance is in optimization theory, where the tools of convex analysis provide efficient methods to solve convex optimization problems. It is also useful in many cases of mathematical modelling using optimality principles, as in, for example, thermodynamics. As time and motivation permits, I will try to learn more about the applications and write up notes. In this post I will review the basics of convex analysis in order to set the stage for future writing on applications.
Two fundamental concepts are those of a convex set and a convex function. A set is convex if for all elements and , the linear combination also belongs to the set. Thus, a set is convex if the line connecting any pair of its elements is contained in the set. This is illustrated in Figure 1, where the green region on the left is not convex, while the blue region on the right is convex. A function , where , is convex if its domain is a convex set and for any elements and ,
The second condition means that the linear interpolation between any two points in the domain always lies above function itself. If the inequality is strict, the function is said to be strictly convex. Figure 2 shows a non-convex and a convex function and illustrates the linear interpolation condition. A function is said to be concave if is convex. Convex functions have many useful special properties, among them are:
- All sublevel sets of a convex function are convex sets. The converse does not hold since a larger class of functions than just the convex functions has only convex sublevel sets; these functions are called quasi-convex function.
- A convex function has no saddle points.
- A convex function has only global minima, i.e. all local minima are also global minima. Often there is a unique strict global minimum, but it is also possible that there are several satisfying , for all .
- The maximum/supremum of any (non-empty) family of convex functions is itself a convex function. That is, is convex whenever is non-empty and all are convex.
- The function , where is a non-empty convex set, is also convex. Thus, minimization/infimization over one or more variables of a convex function yields a new convex function, provided that the minimization/infimization is carried out over a convex set.
A very common convention introduces a slight modification of the above notion of a convex function. It is customary in convex analysis to formally extend the domain of functions to the whole and their codomain to . When is the original domain, this is done by defining for all elements that are outside the domain. This requires some special rules for arithmetic expressions involving infinity, e.g. and , while is not needed and left undefined. By extending the domain and codomain in this fashion, it is possible to simplify the notation a bit and avoid technical special cases.
A convex optimization problem is commonly defined as optimization problem of the form
where the objective and the functions , are all convex. Additionally, and , . Note that the last requirements means that only affine (“linear”) equality constraints are allowed. A convex optimization problem can be reformulated to an equivalent optimization problem which is not convex on this definition. For this reason, a more general terminology according to which a convex optimization problem is the minimization of a convex function on a convex domain may be more natural. With this more general terminology, a convex optimization problem is in standard form when it is written in the form (C).
Convex optimization problems are relatively easy to solve due to the absence of saddle points and local minima that are not also global minima. For differentiable problems, a gradient descent is guaranteed to end up at a global minimum, provided that the steps are sufficiently small (how small depends on the problem). More generally, at least differentiable convex optimization problems may be solved by for example the log-barrier method, which is one of the simpler interior-point methods to grasp (Boyd and Vandenberghe, Chapter 11). The idea behind the log-barrier method is approximate the inequality constraints by adding a logarithmic barrier of the form
to the objective function. For a particular value of , the modified optimization problem may be solved by Newton’s method or a quasi-Newton method. The larger the value of , the better the approximation, but it also yields a smaller gradient contribution from the logarithmic barrier, making it harder for a Newton method to avoid steps that lead out of the region allowed by the constraints. For this reason, one solves a sequence of modified problems, with progressively larger .
One of the most theoretically useful tools in convex analysis is the Legendre-Fenchel transform. Given a (convex or non-convex) function , where as usual, the Legendre-Fenchel transform is defined as
(In fact, the transformation may be defined more generally as a transformation between possibly infinite-dimensional dual spaces with a kind of scalar product, or more accurately, pairing functional between and . Here I stick to for simplicity.) The function is called the convex conjugate of . The convex conjugate is always a convex function, even when the original function is non-convex. When is strictly convex, differentiable and defined over the whole , the convex conjugate can be characterized further. For a given , the unique maximum of the strictly concave function
occurs at the point satisfying
Since the maximum is unique, there is an inverse function of the gradient . Thus,
where the function arguments satisfy and . What has happened here is that the convex conjugate is expressed as a function of the gradient of the original function. Apart from sign conventions, this is exactly what happens in thermodynamics. For example, the internal energy as a function of entropy, volume and mole number may be Legendre transformed to a function of temperature, pressure and chemical potential. The latter three quantities are intensive quantities, defined at equilibrium as the derivatives of the internal energy with respect the corresponding three extensive variables. The convex conjugate of the internal energy is given by
If the internal energy is a strictly convex and differentiable function defined for all triples of non-negative numbers, the function inside the supremum is strictly concave with a unique maximum satisfying
In other words, is the gradient of . Letting denote the inverse of the gradient, the result is
Apart from the minus sign before the pressure, the convex conjugate of the internal energy is the Gibbs free energy . Naturally, Legendre-Fenchel transforming only some of the variables in the internal energy yields other thermodynamic potentials and the example can be generalized to more variables than just . In order for a double Legendre transform to take us back to where we started, it is essential that all thermodynamic potentials are either concave or convex in each of their arguments.
Returning to the general form of the Legendre-Fenchel transform, one may wonder what properties a doubly transformed function obtains. An investigation of the convexity of the doubly transformed function, also called convex biconjugate,
reveals that it is convex. Furthermore, the convex biconjugate is the maximal convex function that still bounds the original function from below, . If is convex, then it is its own biconjugate, (here I neglect the issue of lower semi-continuity). These properties make the Legendre-Fenchel transform extremely useful, as it provides an explicit construction of a convex function with global minima which coincides with the global minima of the original, arbitrary function. This ability to make a function convex without changing its global minima enables simplification of many optimality models, since one need not be concerned with the possibility of saddle points and local minima. Instead the global minima can be characterized or identified using optimality conditions for convex functions. For example, the global minima of are exactly those points for which the subdifferentials of include zero. A subdifferential of a convex function at a point is the gradient of any line (affine function) that goes through the point, yet nowhere goes above the convex function. More formally, for convex functions any vector satisfying
for all , is a subgradient at . As is clear from the inequality, is a global minimum of if and only if is a subgradient of . The condition that is a subgradient of the convex biconjugate is much easier to work with than a more direct condition for a global minimum of the original, possibly non-convex, function.
Figure 3 below illustrates the convex conjugate and biconjugate of a function. The original function is plotted as a red line on the left. It is non-convex and has several local minima (which are not global minima). Its convex conjugate is a convex function, as shown in the middle plot. Finally, the right plot shows both the original function (red line) and its convex biconjugate (blue line). Note that the biconjugate coincides with the original function in, for example, the region and that its only local minimum is also the global minimum.
Discrete domains and convexity
In the above discussion convexity has been defined for subsets of the continuous set . By the above definition a discrete set, such as a subset of , is not a convex set. Due to the convenient properties of convex optimization problems, much interest has been directed at formulating a notion of discrete convexity that allows many of the results and tools of convex analysis to be carried over to discrete optimization. A discrete version of convex analysis is perhaps bound to at least be less successful in providing efficient generic optimization methods. Nevertheless, there are approaches to discrete convexity that provide results analogous to some of the results of convex analysis. An example is the connection found by Takashi Ui between notions of discrete convexity and lack of local minima (which are not also global minima).
Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press (2008). [The book is out of print, but available for free at one of the authors’ homepage.]
R. T. Rockafellar. Lagrange multipliers and optimality. SIAM Review, 35:183 (1993) [Also available at the author’s homepage.]