## Convex analysis and thermodynamics

A previous post briefly reviewed convex analysis. Here I’ll review the application of convexity in basic thermodynamics.

Equilibrium states

The concept of thermodynamic equilibrium is a generalization of mechanical equilibrium, where all forces and torques cancel each other. Informally, the idea is that a system in thermodynamic equilibrium has stable, unchanging macroscopic properties, which may be characterized by an n-tuple of extensive variables. How many and which variables depends on the system at hand and is, as far as I know, an empirical matter. For simple fluids, which consists of only one kind of particles, the extensive variables are the internal energy, the volume, and the number of particles. More complicated systems require additional variables. In general, the extensive variables may be collected in a vector $X=(X_1,X_2,\ldots,X_n)\in\mathbb{R}^n$, with the first variable being the internal energy $U=X_1$. Some of the extensive variables may be fixed by external constraints, while others are free to vary.

Identifying the n-tuple $(X_1,\ldots,X_n)$ with an equilibrium state, the first assumption is:

Postulate 1: The manifold of equilibrium states is a convex set.

For simplicity, this convex set will be taken to have the form $E=\{(X_1,\ldots,X_n)\in\mathbb{R}^n | 0 \leq X_i\}$.

Postulate 2: There is a function $S:E\to\mathbb{R}$, called entropy, of the extensive variables of a system. At thermodynamic equilibrium, the extensive variables take values that maximize the entropy subject to the external constraints.

Before the entropy function can play a useful role in the theory, it is necessary to know some of its properties:

Postulate 3: The entropy is (i) additive over subsystems, (ii) homogenuous in the sense that $S(\zeta X_1,\ldots, \zeta X_n) = \zeta S(X_1,\ldots,X_n)$ (with $0 < \zeta$), and (iii) a strictly monotonically increasing function of the internal energy $U$.

When two systems with equilibrium states $E_1=E_2\subseteq\mathbb{R}^n$ are considered subsystems of a larger system the equilibrium states of the joint system can be taken to be the convex set $E = E_1\times E_2$. For convenience, the manifolds $E_1$ and $E_2$ are assumed to be equal. This can always be achieved by adding to the to the manifold fictional coordinates $X^{(s)}_j$ that are considered subject to constraints $X^{(s)}_j=0$. These constraints are lifted when the two subsystems are able to interact by exchanging the quantity represented by $X^{(s)}_j$. Consider, for example, two rigid vessels containing hydrogen molecules and oxygen molecules, respectively. When isolated, the systems are characterized by their internal energies, volumes and number of particles (hydrogen molecules and oxygen molecules, respectively). Thus the extensive variables of the first system could be taken to be $X^{(1)} = (U^{(1)},V^{(1)},N^{(1)}_{\text{H}_2})$, but it is convenient to let $X^{(1)} = (U^{(1)},V^{(1)},N^{(1)}_{\text{H}_2},N^{(1)}_{\text{O}_2},N^{(1)}_{\text{H}_2\text{O}})$ and consider $N^{(1)}_{\text{O}_2} = N^{(1)}_{\text{H}_2\text{O}} = 0$ as an additional external constraint on the system since no oxygen or water molecules can enter the isolated system. Since the system is isolated and the vessel is rigid the variables are also subject to the constraints $(U^{(1)},V^{(1)},N^{(1)}_{\text{H}_2}) = (u_1,v_1,n_{\text{H}_2})$. The variables for the vessel containing oxygen molecules are chosen and constrained analogously. When the two vessels are brought into contact and their volumes are connected, the joint system is characterized by $X^{(12)} = (X^{(1)},X^{(2)})$. Letting $X=X^{(1)}+X^{(2)}$ denote the sum of extensive variables for the subsystems, the relevant constraints are now:

• $U=u_1+u_2$ (constant total energy),
• $V=v_1+v_2$ (constant total volume),
• $N_{\text{H}_2} + N_{\text{H}_2\text{O}} = n_{\text{H}_2}$ (constant number of hydrogen atoms), and
• $N_{\text{O}_2} + \tfrac{1}{2} N_{\text{H}_2\text{O}} = n_{\text{O}_2}$ (constant number of oxygen atoms).

Notice that the joint system has a new macroscopic degree of freedom, $N_{\text{H}_2\text{O}}$, that is not a real degree of freedom in either of the subsystems before they are brought into contact.

Keeping the above discussion in mind, the meaning of Postulate 3(i) is that the entropy of the joint system can be decomposed into entropies of the subsystems, $\displaystyle S_{12}(X^{(12)}) = S_1(X^{(1)}) + S_2(X^{(2)})$,

where the entropy of a subsystem is a function of only the extensive parameters of that subsystem. Here, $S_1$ and $S_2$ are defined on the same domain $E_1=E_2$ and are the same function.

Concavity of the entropy function

Postulate 2 asserts that the joint equilibrium state of two interacting subsystems is the solution $X^{(12)*}=(X^{(1)*},X^{(2)*})$ to the optimization problem $\displaystyle \begin{array}{cl} \max & S_{12}(X^{(12)}) = S_1(X^{(1)}) + S_2(X^{(2)}) \\ \text{subject to} & A X = B \end{array}$

Here the matrix $A$ defines which linear combinations of variables are constrained. Often, but not always, the constraint is simply that $X = X^{(1)}+X^{(2)}$ is constant. Note that there were two constraints of a more general form in the above example with the vessels of hydrogen molecules and oxygen molecules. Defining the function $\displaystyle S'(X) = \sup_{X^{(1)}\in E_1, X-X^{(1)}\in E_2} (S_1(X^{(1)}) + S_2(X-X^{(1)}))$

it is now possible to reexpress the optimization problem as $\displaystyle \begin{array}{cl} \max & S'(X) \\ \text{subject to} & A X = B \end{array}$

Now temporarily assume that $A$ is invertible so that $X$ is uniquely determined from the constraints on the joint system. In that case $S'(X) = S'(X^{(1)}+X^{(2)})$ may be considered the entropy of a joint system formed by bringing into contact two isolated systems, initially in states $X^{(1)}$ and $X^{(2)}$, respectively. Furthermore, the function $S'$ may be identified with the subsystem entropy functions $S_1 = S_2$, because it is defined on the same domain and it represents the same physical quantity. Writing $S = S' = S_1 = S_2$, it now follows that when two systems are brought into contact, with resulting changes of states from $X^{(s)}$ to $X^{(s)*}$, $s=1,2$, the entropy of joint system is $\displaystyle \begin{array}{rl} S_{12}(X^{(1)*},X^{(2)*}) & = S(X^{(1)*}) + S(X^{(2)*}) = S(X^{(1)} + X^{(2)}) \\ & \geq S(X^{(1)}) + S(X^{(2)}) \end{array}$

where the inequality follows from the expression for $S'=S$ in terms of a supremum and the expression as a whole holds for all $X^{(1)}+X^{(2)} = X^{(1)*}+X^{(2)*}$. The last inequality holds in full generality and together with extensivity (Postulate 3(ii)), it follows that entropy is a concave function, i.e. $S(\zeta X^{(1)} + (1 - \zeta) X^{(2)}) \geq \zeta S(X^{(1)}) + (1-\zeta) S(X^{(2)})$,

for all $0 \leq \zeta \leq 1$.

Turning to the somewhat artificial case when $A$ is not invertible and the joint system is therefore not constrained to the states of constant $X^{(1)}+X^{(2)}$, but can relax its state further (perhaps by being able to exchange particles with its environment), the above discussion remains valid with the following modification $\displaystyle \begin{array}{rl} S_{12}(X^{(1)*},X^{(2)*}) & = S(X^{(1)*}) + S(X^{(2)*}) \geq S(X^{(1)} + X^{(2)}) \\ & \geq S(X^{(1)}) + S(X^{(2)}) \end{array}$

The last inequality remains unchanged and concavity follows as before.

Conclusion: The entropy is a concave function of the extensive variables.

From concavity it follows that the entropy has no minima (except at the boundary of the manifold of equilibrium states), no saddle points, and all local minima are also global minima. When the entropy function is differentiable, the state of maximum entropy may therefore be determined by seeking stationary points of the entropy (or, more precisely, of a Lagrangian taking the constraints into account).

Energy representation

At this point it is useful to change the notation slightly. In order to clearly distinguish functions from function arguments, the entropy function and the internal energy function will in this section be denoted by the calligraphic symbols $\mathcal{S}$ and $\mathcal{U}$, respectively. The fact that $\mathcal{S}(U,X_{2:n})$ is a strictly increasing function of the internal energy (see Postulate 3(iii)) enables the internal energy function $\mathcal{U}(S,X_{2:n})$ to be defined implicity through the equation $\mathcal{S}(\mathcal{U}(S,X_{2:n}), X_{2:n}) = S$.

Strict monotonicity of the entropy function guarantees that this equation has a unique solution. From the concavity of $\mathcal{S}$ it now follows that for the convex combinations $S = \zeta S_1 + (1-\zeta)S_2$ and $Z = \zeta X + (1-\zeta)Y$, with $0 \leq \zeta \leq 1$, $\displaystyle \mathcal{S}(\mathcal{U}(S,Z_{2:n}),Z_{2:n}) = S \\ = \zeta \mathcal{S}(\mathcal{U}(S_1,X_{2:n}),X_{2:n}) + (1-\zeta) \mathcal{S}(\mathcal{U}(S_2,Y_{2:n}),Y_{2:n}) \\ \leq \mathcal{S}(\zeta\mathcal{U}(S_1,X_{2:n}) + (1-\zeta)\mathcal{U}(S_2,Y_{2:n}),Z_{2:n})$.

Using the monotonicity of the entropy function to “invert” this relation now yields $\mathcal{U}(S,Z_{2:n}) \leq \zeta\mathcal{U}(S_1,X_{2:n}) + (1-\zeta)\mathcal{U}(S_2,Y_{2:n})$.

Thus, the internal energy is a convex function. Instead of maximizing the entropy subject to the constraint that the sum of all subsystem energies is constant (and other constraints not involving energy), one may equivalently minimize the internal energy subject to the constraint that the sum of all subsystem entropies is constant (and other constraints unchanged).

Conclusion: Energy-constrained maximization of entropy is equivalent to entropy-constrained minimization of internal energy. Both methods yields the equilibrium state of a thermodynamic system.

From this principle one recovers as a special the mechanical condition for equilibrium. Mechanical equilibrium is attained when the potential energy is minimized. At zero temperature the internal energy coincides with the potential energy and the thermodynamical and mechanical equilibrium conditions are equivalent.

Intensive variables

When the internal energy is differentiable one may define an intensive variable for each extensive variable, $\displaystyle \lambda_1 = T = \frac{\partial U(S,X_{2:n})}{\partial S}$, $\displaystyle \lambda_i = \frac{\partial U(S,X_{2:n})}{\partial X_i}, \quad 2\leq i \leq n$.

The derivative w.r.t. volume yields the (negative) pressure, the derivative w.r.t. a particle number yields the corresponding chemical potential, the derivative w.r.t. an external electric field yields the polarization, the derivative w.r.t. an external magnetic field yields the magnetization, derivatives w.r.t. strain deformations yield the stress tensor, and so on. Most intensive quantities are familiar from other branches of physics. The temperature (derivative w.r.t. entropy) is special in that it has no analogue in other branches of physics. Intensive variables provide a convenient way to express equilibrium conditions. For example, two fully interacting subsystems are in equilibrium when all their intensive variables are equal.

If the internal energy is not differentiable, one may introduce intensive variables as the new variables that are introduced when the internal energy is Legendre-Fenchel transformed (see the previous post on convex analysis).

Thermodynamic potentials and Massieu functions

Thermodynamic potentials are partial Legendre-Fenchel transforms of the internal energy. Transforming the internal energy w.r.t. entropy yields $\displaystyle U^*_1(T,X_{2:n}) = \sup_{S\geq 0} \left(TS - U(S,X_{2:n})\right)$.

The new variable $T$ can be identified with the temperature and for a differentiable internal energy function it will coincide with the definition in terms of a derivative. A partial Legendre-Fenchel transform flips the convexity/concavity property of a function. The internal energy is convex in all extensive variables), while the transformed function $U^*_1$ is concave in the temperature and convex in the remaining (extensive) variable. In general, the Legendre-Fenchel transforms are concave in the intensive variables and convex in the extensive variables.

The Legendre-Fenchel transformed functions, called thermodynamic potentials, are primarily useful in situations when the thermodynamic system of interest is in equilibrium with an environment with known intensive parameters. For example, if the system is in equilibrium with an environment with known temperature and pressure, it is very useful to perform two Legendre-Fenchel transforms that replace entropy and volume by temperature and pressure. The resulting thermodynamic potential is called the free energy and the equilibrium properties of the system follow from holding temperature and pressure fixed while minimizing the free energy w.r.t. the remaining variables.

Massieu functions are analogous to thermodynamic potentials, but they arise from Legendre-Fenchel transforms of the (negative) entropy rather than of the internal energy.

When postulates fail: monotonicity and spin chains

Spin chains in external magnetic fields provide a simple example of how the postulates above can fail. Consider a spin chain consisting of $N$ spins that interact with an external magnetic field $B$. Readers not familiar with the quantum mechanical concept of spin may think of microscopic magnets having the peculiar property that they are either parallel or anti-parallel to the magnetic field. Letting $N_0$ and $N_1$, $N_0+N_1=N$ denote the number of spins that are parallel and anti-parallel to the magnetic, respectively, the internal energy of the system is given by $U = (N_1-N_0) \mu B$

where $\mu$ is a constant. In statistical mechanics, the Boltzmann entropy is defined as the logarithm of the number of microstates consistent with a given thermodynamic equilibrium state. The number of microstates consistent with a magnetization $\mu (N_1-N_0)$ is given by $\displaystyle \left(\begin{array}{cc} N \\ N_0 \end{array} \right) = \frac{N!}{N_0! N_1!}$

and the Boltzmann entropy is $\displaystyle S_{\text{B}} = \log(N!) - \log(N_0!) - \log(N_1!)$.

By varying the number of parallel spins over the interval $0 \leq N_0 \leq N$ one obtains a discrete set of points $(U(N,N_0),S_{\text{B}}(N,N_0))$ that define the Boltzmann entropy as a function of the internal energy. A complication arises here since the notions of convexity and concavity used above are only defined for functions of continuous domains, while the entropy is only defined for a discrete domain of energy values. However, extending the Boltmann entropy function through interpolation circumvents this problem. (Alternatively, a fully quantum mechanical treatment using the von Neumann entropy also circumvents this problem, without resolving the underlying failure of the thermodynamic postulates.) The underlying problem is instead that the Boltzmann entropy $S_{\text{B}}(U)$ is not an increasing function of the internal energy, in contradiction with Postulate 3(iii). Plotting the points $(U(N,N_0),S_{\text{B}}(N,N_0))$, as done on the left in the figure below for the case $N = 100$, reveals that the Boltzmann entropy reaches a maximum at zero internal energy, and decreases as the internal energy is increased further. Plots of Boltzmann entropy (left), coldness (middle), and temperature (right) as functions of the spin chain internal energy. Arbitrary units.

The slope at different points on the entropy curve is the coldness $\beta = 1/T$ of the system, defined as the inverse temperature. At the maximum of the entropy, the coldness passes zero, and the temperature tends to infinity. When the maximum is approach from the left, the temperature tends towards $+\infty$ and the spin chain becomes infinitely hot! For positive values of the internal energy the temperature is negative and as the maximum is approached from the left the temperature tends towards $-\infty$. It is easier to think of this in terms of coldness, which decreases monotonically with increasing internal energy. Thus, a higher internal energy also corresponds to a lower coldness, i.e. a hotter system. In a sense, $T\to 0^-$ is the hottest of all temperature limits, $T\to -\infty$ is a colder limit that nevertheless is hotter than any positive temperature. Heat spontaneously flows from a system with negative temperature to any system with positive temperature and the negative temperature states that occur for $U > 0$ are perhaps best thought of as some kind of pseudo-equilibrium states, rather than true equilibrium states.

The non-monotonicity of the Boltmann entropy for the spin chain also means that the maximum entropy principle cannot be equivalently reexpressed as a minimum internal energy principle.