## Convex analysis

“In fact the great watershed in optimization isn’t between linearity and nonlinearity, but convexity and nonconvexity.”

— Rockafellar

Introduction and basic concepts

Convex analysis is a beautiful mathematical theory with many applications in science and engineering. Its most obvious significance is in optimization theory, where the tools of convex analysis provide efficient methods to solve convex optimization problems. It is also useful in many cases of mathematical modelling using optimality principles, as in, for example, thermodynamics. As time and motivation permits, I will try to learn more about the applications and write up notes. In this post I will review the basics of convex analysis in order to set the stage for future writing on applications.

Figure 1. The green region (left) is non-convex and the blue region (right) is convex.

Two fundamental concepts are those of a convex set and a convex function. A set $A \subseteq \mathbb{R}^n$ is convex if for all elements $x,y\in S$ and $t\in (0,1)$, the linear combination $tx+(1-t)y\in S$ also belongs to the set. Thus, a set is convex if the line connecting any pair of its elements is contained in the set. This is illustrated in Figure 1, where the green region on the left is not convex, while the blue region on the right is convex. A function $f:D\to\mathbb{R}^n$, where $D\subseteq\mathbb{R}^n$, is convex if its domain $D$ is a convex set and for any elements $x,y\in D$ and $t\in (0,1)$,

$f(tx+(1-t)y) \leq tf(x) + (1-t)f(y)$.

The second condition means that the linear interpolation between any two points in the domain always lies above function itself. If the inequality is strict, the function is said to be strictly convex. Figure 2 shows a non-convex and a convex function and illustrates the linear interpolation condition. A function $f$ is said to be concave if $-f$ is convex. Convex functions have many useful special properties, among them are:

• All sublevel sets $\{x \in \mathbb{R}^n | f(x) \leq \alpha\}$ of a convex function $f$ are convex sets. The converse does not hold since a larger class of functions than just the convex functions has only convex sublevel sets; these functions are called quasi-convex function.
• A convex function has no saddle points.
• A convex function $f$ has only global minima, i.e. all local minima are also global minima. Often there is a unique strict global minimum, but it is also possible that there are several $x^*$ satisfying $f^* = f(x^*) \leq f(x)$, for all $x$.
• The maximum/supremum of any (non-empty) family of convex functions is itself a convex function. That is, $g(x) = \sup_{\alpha\in F} f_{\alpha}(x)$ is convex whenever $F$ is non-empty and all $f_{\alpha}$ are convex.
• The function $g(x) = \inf_{y\in C} f(x,y)$, where $C \subseteq \mathbb{R}^k$ is a non-empty convex set, is also convex. Thus, minimization/infimization over one or more variables of a convex function yields a new convex function, provided that the minimization/infimization is carried out over a convex set.

A very common convention introduces a slight modification of the above notion of a convex function. It is customary in convex analysis to formally extend the domain of functions to the whole $\mathbb{R}^n$ and their codomain to $\mathbb{R}\cup\{-\infty,+\infty\}$. When $D\subseteq\mathbb{R}^n$ is the original domain, this is done by defining $f(x)=+\infty$ for all elements $x\in \mathbb{R}\setminus D$ that are outside the domain. This requires some special rules for arithmetic expressions involving infinity, e.g. $0 \cdot \infty = 0$ and $(\pm \infty) + (\pm \infty) = \pm\infty$, while $\infty-\infty$ is not needed and left undefined. By extending the domain and codomain in this fashion, it is possible to simplify the notation a bit and avoid technical special cases.

Figure 2. A non-convex function (red line) and a convex function (blue line). The red curve is non-convex because e.g. the black line segment between points on the curve is below the curve.

Convex optimization

A convex optimization problem is commonly defined as optimization problem of the form

(C) $\displaystyle \begin{array}{cc} \text{min}_{x\in \mathbb{R}^n} f(x) \\ \text{subject to} -g_i(x) \leq 0 \\ a^T_j x - b_j = 0 \end{array}$

where the objective $f(x)$ and the functions $g_i(x)$, $i = 1,\ldots,N_I$ are all convex. Additionally, $a_j\in\mathbb{R}^n$ and $b_j\in\mathbb{R}$, $j=1,\ldots,N_E$. Note that the last requirements means that only affine (“linear”) equality constraints are allowed. A convex optimization problem can be reformulated to an equivalent optimization problem which is not convex on this definition. For this reason, a more general terminology according to which a convex optimization problem is the minimization of a convex function on a convex domain may be more natural. With this more general terminology, a convex optimization problem is in standard form when it is written in the form (C).

Convex optimization problems are relatively easy to solve due to the absence of saddle points and local minima that are not also global minima. For differentiable problems, a gradient descent is guaranteed to end up at a global minimum, provided that the steps are sufficiently small (how small depends on the problem). More generally, at least differentiable convex optimization problems may be solved by for example the log-barrier method, which is one of the simpler interior-point methods to grasp (Boyd and Vandenberghe, Chapter 11). The idea behind the log-barrier method is approximate the inequality constraints by adding a logarithmic barrier of the form

$\displaystyle -\frac{1}{t} \sum_{i} \log(-g_i(x))$

to the objective function. For a particular value of $t > 0$, the modified optimization problem may be solved by Newton’s method or a quasi-Newton method. The larger the value of $t>0$, the better the approximation, but it also yields a smaller gradient contribution from the logarithmic barrier, making it harder for a Newton method to avoid steps that lead out of the region allowed by the constraints. For this reason, one solves a sequence of modified problems, with progressively larger $t$.

Legendre-Fenchel transforms

One of the most theoretically useful tools in convex analysis is the Legendre-Fenchel transform. Given a (convex or non-convex) function $f:D\to\mathbb{R}$, where $D\subseteq\mathbb{R}^n$ as usual, the Legendre-Fenchel transform is defined as

$\displaystyle f^*(y) = \sup_{x\in D} (y^T x - f(x)), \quad y\in\mathbb{R}^n$.

(In fact, the transformation may be defined more generally as a transformation between possibly infinite-dimensional dual spaces with a kind of scalar product, or more accurately, pairing functional $(y,x)$ between $y$ and $x$. Here I stick to $\mathbb{R}^n$ for simplicity.) The function $f^*(y)$ is called the convex conjugate of $f(x)$. The convex conjugate is always a convex function, even when the original function is non-convex. When $f(x)$ is strictly convex, differentiable and defined over the whole $\mathbb{R}^n$, the convex conjugate can be characterized further. For a given $y$, the unique maximum of the strictly concave function

$y^T x - f(x)$

occurs at the point satisfying

$\nabla (y^T x - f(x)) = y - \nabla f(x) = 0$.

Since the maximum is unique, there is an inverse function $g(y)$ of the gradient $\nabla f(x)$. Thus,

$f^*(y) = (\nabla f(x))^T x - f(x) = y^T g(y) - f(g(y))$

where the function arguments satisfy $y = \nabla f(x)$ and $\quad x = g(y)$. What has happened here is that the convex conjugate is expressed as a function of the gradient of the original function. Apart from sign conventions, this is exactly what happens in thermodynamics. For example, the internal energy $U(S,V,N)$ as a function of entropy, volume and mole number may be Legendre transformed to a function of temperature, pressure and chemical potential. The latter three quantities are intensive quantities, defined at equilibrium as the derivatives of the internal energy with respect the corresponding three extensive variables. The convex conjugate of the internal energy is given by

$\displaystyle U^*(T,-p,\mu) = \sup_{S,V,N} (T S + (-p)V + \mu N - U(S,V,N))$

If the internal energy is a strictly convex and differentiable function defined for all triples $(S,V,N)$ of non-negative numbers, the function inside the supremum is strictly concave with a unique maximum satisfying

$\displaystyle T = \frac{\partial U}{\partial S}, \quad -p = \frac{\partial U}{\partial V}, \quad \mu = \frac{\partial U}{\partial N}$.

In other words, $(T,-p,N)^T$ is the gradient of $U(S,V,N)$. Letting $g(T,-p,N)$ denote the inverse of the gradient, the result is

$U^*(T,-p,\mu) = (T,-p,\mu) \cdot g(T,-p,\mu) - U(g(T,-p,\mu))$.

Apart from the minus sign before the pressure, the convex conjugate of the internal energy is the Gibbs free energy $G(T,p,\mu) = U^*(T,-p,\mu)$. Naturally, Legendre-Fenchel transforming only some of the variables in the internal energy yields other thermodynamic potentials and the example can be generalized to more variables than just $S,V,N$. In order for a double Legendre transform to take us back to where we started, it is essential that all thermodynamic potentials are either concave or convex in each of their arguments.

Returning to the general form of the Legendre-Fenchel transform, one may wonder what properties a doubly transformed function obtains. An investigation of the convexity of the doubly transformed function, also called convex biconjugate,

$\displaystyle f^{**}(x) = \sup_{y} (x^T y - f^*(y)) = \sup_{y} \left[x^T y - \sup_{z\in D} (y^T z - f(z)) \right]$

reveals that it is convex. Furthermore, the convex biconjugate $f^{**}(x)$ is the maximal convex function that still bounds the original function from below, $f^{**}(x) \leq f(x)$. If $f(x)$ is convex, then it is its own biconjugate, $f(x)=f^{**}(x)$ (here I neglect the issue of lower semi-continuity). These properties make the Legendre-Fenchel transform extremely useful, as it provides an explicit construction of a convex function with global minima which coincides with the global minima of the original, arbitrary function. This ability to make a function convex without changing its global minima enables simplification of many optimality models, since one need not be concerned with the possibility of saddle points and local minima. Instead the global minima can be characterized or identified using optimality conditions for convex functions. For example, the global minima of $f(x)$ are exactly those points for which the subdifferentials of $f^{**}(x)$ include zero. A subdifferential of a convex function at a point $(x,f^{**}(x))$ is the gradient of any line (affine function) that goes through the point, yet nowhere goes above the convex function. More formally, for convex functions any vector $a\in\mathbb{R}^n$ satisfying

$f^{**}(x+y) \geq f^{**}(x) + a^T (y-x)$,

for all $y$, is a subgradient at $x$. As is clear from the inequality, $x$ is a global minimum of $f^{**}(x)$ if and only if $0$ is a subgradient of $f^{**}(x)$. The condition that $0$ is a subgradient of the convex biconjugate is much easier to work with than a more direct condition for a global minimum of the original, possibly non-convex, function.

Figure 3 below illustrates the convex conjugate and biconjugate of a function. The original function is plotted as a red line on the left. It is non-convex and has several local minima (which are not global minima). Its convex conjugate is a convex function, as shown in the middle plot. Finally, the right plot shows both the original function (red line) and its convex biconjugate (blue line). Note that the biconjugate coincides with the original function in, for example, the region $x \leq -1$ and that its only local minimum is also the global minimum.

Figure 3. On the left a non-convex function (red line), in the middle its convex conjugate (purple line), and on the right the original function (red) is shown together with its biconjugate (blue).

Discrete domains and convexity

In the above discussion convexity has been defined for subsets of the continuous set $\mathbb{R}^n$. By the above definition a discrete set, such as a subset of $\mathbb{Z}^n$, is not a convex set. Due to the convenient properties of convex optimization problems, much interest has been directed at formulating a notion of discrete convexity that allows many of the results and tools of convex analysis to be carried over to discrete optimization. A discrete version of convex analysis is perhaps bound to at least be less successful in providing efficient generic optimization methods. Nevertheless, there are approaches to discrete convexity that provide results analogous to some of the results of convex analysis. An example is the connection found by Takashi Ui between notions of discrete convexity and lack of local minima (which are not also global minima).

References

Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press (2008). [The book is out of print, but available for free at one of the authors’ homepage.]

R. T. Rockafellar. Lagrange multipliers and optimality. SIAM Review, 35:183 (1993) [Also available at the author’s homepage.]

T. Ui. A Note on Discrete Convexity and Local Optimality. Japan J. Indust. Appl. Math. 23:21 (2006) [Also available at the author’s homepage.]