These are some notes on some differentiability properties of the maximum of a finite number of functions based on some results taken mainly from the book of Borwein and Lewis and Rockafellar and Wets's "Variational Analysis".

Directional Differentiability

Let $ X$ be a finite-dimensional real Euclidean space and $ g_i: C \to \mathbb{R}$ for $ i\in{0,1, \ldots N}$ are continuous functions, where $ C \subseteq X$ with a nonempty interior. It is convenient to define $ \mathcal{N} = \{0,1,\ldots, N\}$. Let $ \bar{x}$ be a point in the interior of $ C$. Define the function

$$\begin{align} g(x) = \max_{i\in\mathcal{N}}\{g_i(x)\}.\end{align}$$

Suppose that $ g_i$ are Gâteaux-differentiable at $ \bar{x}$.

For a function $ f: X \to \mathbb{R}$ we define its directional derivative at a point $ \bar{x}$ for a direction $ d \in X$ as the following one-sided limit, when it exists

$$\begin{align}f'(\bar{x}; d) = \lim_{t \downarrow 0} \frac{f(\bar{x} + td) - f(\bar{x})}{t}.\tag{$*$}\end{align}$$

The following proposition gives the directional derivative of $ g$ along a direction $ d\in X$.

Proposition 1. Let $ \mathcal{I}_{\bar{x}} = \{i {}:{} g_i(\bar{x}) = g(\bar{x})\}$. Then, for all $ d\in X$ the directional derivative of $ g$ along the direction $ d$ is

$$\begin{align}g'(\bar{x}; d) = \max_{i\in\mathcal{I}_{\bar{x}}} {\langle \nabla g_i(\bar{x}), d\rangle}.\tag{1}\end{align}$$

Proof. Let us have a look at the following figure

Function g and a point where some of the functions return the same (maximum) value while other ones give a smaller value.

At any point $ \bar{x}$ we have that $ g(\bar{x}) = g_i(\bar{x})$ for all $ i \in \mathcal{I}_{\bar{x}}$ and $ g(\bar{x}) > g_i(\bar{x})$ for all $ i \not\in \mathcal{I}_{\bar{x}}$. By the continuity of $ g_i$, there is a neighbourhood, $ \mathcal{U}_{\bar{x}}$, of $ \bar{x}$ where $ g(x) > g_i(x)$, therefore the directional derivative at $ x$ is not affected by those $ g_i$ with $ i \not\in \mathcal{I}_{\bar{x}}$, so without loss of generality we can assume that $ \mathcal{I}_{\bar{x}} = \mathcal{N}$.

Next, we have that

$$\begin{align}\liminf_{t \downarrow 0} \frac{g(\bar{x} + td) - g(\bar{x})}{t} {}\geq{}& \liminf_{t \downarrow 0} \frac{g_i(\bar{x} + td) - g(\bar{x})}{t} \\ {}={}& \liminf_{t \downarrow 0} \frac{g_i(\bar{x} + td) - g_i(\bar{x})}{t}\end{align}$$

that is,

$$\liminf_{t \downarrow 0} \frac{g(\bar{x} + td) - g(\bar{x})}{t} {}\geq{} \left\langle \nabla g_i(\bar{x}), d \right\rangle.\tag{2}$$

It remains to show that

$$\begin{align}\limsup_{t \downarrow 0} \frac{g(\bar{x} + td) - g(\bar{x})}{t} {}\leq{}& \left\langle \nabla g_i(\bar{x}), d \right\rangle,\tag{3}\end{align}$$

for all $ i$, or, what is the same

$$\begin{align}\limsup_{t \downarrow 0} \frac{g(\bar{x} + td) - g(\bar{x})}{t} {}\leq{}& \max_{i\in\mathcal{N}}\left\langle \nabla g_i(\bar{x}), d \right\rangle.\end{align}$$

Instead, assume that

$$\begin{align}\limsup_{t \downarrow 0} \frac{g(\bar{x} + td) - g(\bar{x})}{t} {}>{} \max_{i\in\mathcal{N}}\left\langle \nabla g_i(\bar{x}), d \right\rangle,\end{align}$$

that is, there is $ \epsilon > 0$ and a sequence $ (t_k)_k$ with $ t_k \downarrow 0$, such that

$$\begin{align}\frac{g(\bar{x} + t_k d) - g(\bar{x})}{t_k} {}\geq{} \max_{i\in\mathcal{N}}\left\langle \nabla g_i(\bar{x}), d \right\rangle + \epsilon.\end{align}$$

Let $ j_k$ (depends on $ t_k$) be such $ g(\bar{x} + t_k d) = g_{j_k}(\bar{x}_k + t_k d)$. Since $ (j_k)_k \subseteq \mathcal{N}$ (there are only finitely many such indices) there is a subsequence $ (t_{\nu_k})_{k \in \mathbb{N}}$ of $ (t_k)_{k\in\mathbb{N}}$ such that $ j_{\nu_k} = \iota$ (the same index), i.e.,

$$\begin{align}\frac{g_{\iota}(\bar{x} + t_{\nu_k} d) - g_{\iota}(\bar{x})}{t_{\nu_k}} {}\geq{} \max_{i\in\mathcal{N}}\left\langle \nabla g_i(\bar{x}), d \right\rangle + \epsilon.\end{align}$$

By taking the limit as $ \epsilon \downarrow 0$ on both sides, we have

$$\begin{align} \langle \nabla g_{\iota}(\bar{x}), d \rangle {}\geq{} \max_{i\in\mathcal{N}}\left\langle \nabla g_i(\bar{x}), d \right\rangle + \epsilon,\end{align}$$

which is a contradiction! Therefore,

$$\begin{align}\limsup_{t \downarrow 0} \frac{g(\bar{x} + td) - g(\bar{x})}{t} {}\overset{(3)}{\leq}{} \max_{i\in\mathcal{N}}\left\langle \nabla g_i(\bar{x}), d \right\rangle {}\overset{(2)}{\leq}{} \liminf_{t \downarrow 0} \frac{g(\bar{x} + td) - g(\bar{x})}{t},\end{align}$$

which completes the proof. $ \blacksquare$

Note: One should not assume that there exists a region of the form $ [0, \epsilon)$ for some $ \epsilon>0$ such that $ g(\bar{x} + \epsilon d) = g_{i}(\bar{x} + \epsilon d)$, for $ i \in \mathcal{I}_{\bar{x}}$. As a counterexample, consider the functions

$$\begin{align}g_1(x) {}={}& \begin{cases}x^3 \sin\tfrac{1}{x}, &\text{ for } x > 0, \\ 0, & \text{otherwise}\end{cases}, \\ g_2(x) {}={}& -g_1(x)\end{align}$$

Functions g1 and g2 are differentiable at 0, but in all intervals of the form (0, ε), none of them dominates the other.

This is how the derivative of $g_1$ looks like:

Derivative of g_1. The derivative at 0 is equal to 0.

Fritz John Optimality Conditions

Proposition 1 is very useful for proving the Fritz John optimality conditions for constrained minimisation problems with differentiable constraints and cost function. Consider the following optimisation problem

$$\begin{align} \mathbb{P} {}:{} \mathrm{Minimise}_{x \in X} &f(x) \\ \text{subject to: } & g_i(x) \leq 0, i\in \mathcal{N}.\end{align}$$

For every $ x \in X$ such that $ g_i(x) \leq 0$ we define the set of active indices as $ \mathcal{I}(x) = \{i \in \mathcal{N} {}:{} g_i(x) = 0\}$.

Observation. The Fritz John optimality conditions rely on the realisation that if a point $ \bar{x}$ is a local minimiser of $ \mathbb{P}$ then it is a local minimiser of the following function

$$\begin{align} g(x) = \max\{f(x) - f(\bar{x}), g_i(x), i \in \mathcal{I}(\bar{x})\}.\end{align}$$

Indeed, since $ \bar{x}$ is a local minimum of $ \mathbb{P}$, there is a region of the form $ \mho_{\bar{x}, \epsilon} = \{x \in X {}:{} g_i(x) \leq 0, \|x - \bar{x}\| < \epsilon\}$, for some $ \epsilon > 0$, such that $ f(x) \leq f(\bar{x})$ for all $ x\in\mho_{\bar{x}, \epsilon}$. Note that $ g(\bar{x}) = 0$ and for all $ x \in \mho_{\bar{x}, \epsilon}$, $ g(x) \leq g(\bar{x})$.

Theorem 2 (Fritz John Optimality Conditions). Consider the optimisation problem $ \mathbb{P}$ where $ f$ and $ g_i$ are assumed to be differentiable at a point $ \bar{x}$ which is a local minimum of the problem. Then, there exist $ \lambda_0, \lambda_i \geq 0$, not all equal to 0, such that

$$\begin{align}\lambda_0 \nabla f(\bar{x}) + \sum_{i\in\mathcal{I}(\bar{x})}\lambda_i \nabla g_i(\bar{x}) = 0.\end{align}$$

Proof. As discussed above, since $ \bar{x}$ is a local minimiser of $ \mathbb{P}$ it is also a local minimiser of $ g$, therefore, for all directions $ d \in X$, $ g'(x; d) \geq 0$, that is

$$\begin{align}g'(x; d) = \max\{\langle \nabla f(\bar{x}), d \rangle, \langle \nabla g_i(\bar{x}), d \rangle, i \in \mathcal{I}(\bar{x})\} \geq 0.\end{align}$$

Equivalently, there is no $ d \in X$ such that

$$\begin{align}\langle \nabla f(\bar{x}), d \rangle < 0, \text{ and } \langle \nabla g_i(\bar{x}), d \rangle < 0.\end{align}$$

From Gordan's Theorem (essentially, by virtue of the hyperplane separation theorem) the proof is complete. $ \blacksquare$

Mangasarian Fromowitz Optimality Conditions

From the Fritz John optimality conditions we conclude that either we have $ \lambda_0 \neq 0$ and

$$\begin{align}\nabla f(\bar{x}) + \sum_{i\in\mathcal{I}(\bar{x})}\tfrac{\lambda_i}{\lambda_0} \nabla g_i(\bar{x}) = 0,\end{align}$$

or, $ \lambda_0 = 0$ and the optimality conditions read

$$\begin{align}\sum_{i\in\mathcal{I}(\bar{x})} \lambda_i \nabla g_i(\bar{x}) = 0,\tag{4}\end{align}$$

which do not depend on the cost function. One could require that (4) should imply that $ \lambda_1 = \lambda_2 = \ldots = \lambda_m$, or in other words that the gradients $ \{\nabla g_i(\bar{x})\}_{i\in\mathcal{I}(\bar{x})}$ are linearly independent. According to Theorem 2, this would imply that $ \lambda_0 \neq 0$ (since not all lambdas can be equal to zero). These are the linear independence constraint qualifications aka LICQ.

An alternative approach is to require that there is a direction $ d \in X$ such that $ \langle \nabla g_i(\bar{x}), d \rangle < 0$ for all $ i \in \mathcal{I}(\bar{x})$. These are the Mangasarian-Fromowitz constraint qualifications (MFCQ), which are weaker than LICQ (LICQ implies MFCQ).

If MFCQ holds, then the case $ \lambda_0 = 0$ can be excluded, therefore we can find $ \lambda_i' = \lambda_i / \lambda_0 \geq 0$ such that $ \nabla f(\bar{x}) + \sum_{i\in\mathcal{I}(\bar{x})} \lambda_i' \nabla g_i(\bar{x}) = 0$. In other words, there is a Lagrange multiplier that corresponds to $ \bar{x}$.

Subdifferentiability

The subderivative of $ f$ at $ \bar{x}$ for $ d$ is given by

$$\begin{align}{\rm d}f(\bar{x})(d) = \lim_{t \downarrow 0, d' \to d}\frac{f(\bar{x} + td) - f(\bar{x})}{t}\tag{5}.\end{align}$$

It turns out that $ g(x) = \max_{i\in\mathcal{N}}\{g_i(x)\}$ is subdifferentiable with

$$\begin{align}{\rm d}f(\bar{x})(d) = f'(x; d) = \max_{i\in\mathcal{I}(\bar{x})}\{\left\langle\nabla g_i(x), d \right\rangle\}.\tag{6}\end{align}$$

This is Exercise 8.31 in Rockafellar and Wets's "Variational Analysis."

Regular Subgradient

For a function $ f : X \to \overline{\mathbb{R}}$ and a point $ \bar{x} \in X$ where $ f$ is finite, we say that a vector $ v \in X$ is a regular subgradient of $ f$ at $ \bar{x}$ (denoted as $ v \in \widehat{\partial}f(\bar{x})$) if

$$\begin{align}f(x) \geq f(\bar{x}) + \langle v, x - \bar{x}\rangle + o(\|x - \bar{x}\|),\end{align}$$

where the little-o notation is to be interpreted as

$$\begin{align}\liminf_{x \to \bar{x}, x \neq \bar{x}} \frac{f(x) - f(\bar{x}) - \langle v, x - \bar{x}\rangle }{\|x-\bar{x}\|} \geq 0.\end{align}$$

It can be shown that (Exercise 8.4 in [2])

$$\begin{align}\widehat{\partial}f(\bar{x}) {}={} \{v \in X {}:{} \langle v, w \rangle \leq {\rm d}f(\bar{x})(w), \forall w \in X\}.\end{align}$$

We will state this without a proof.

So going back to $ g(x) = \max_{i\in\mathcal{N}} \{g_i(x) \}$ with $ g_i:X \to \mathbb{R}$ being differentiable functions, we have

$$\begin{align}\widehat{\partial}g(\bar{x}) {}={}& \{v \in X {}:{} \langle v, w \rangle \leq {\rm d}g(\bar{x})(w), \forall w \in X\} \\ {}={}& \left\{v \in X {}:{} \langle v, w \rangle \leq \max_{i\in\mathcal{I}(\bar{x})} \langle \nabla g_i(\bar{x}), w\rangle, \forall w \in X\right\} \\ {}={}& \left\{v \in X {}:{} \max_{i\in\mathcal{I}(\bar{x})} \langle \nabla g_i(\bar{x}) - v, w\rangle \geq 0, \forall w \in X\right\}.\end{align}$$

It is straightforward to show that $ \mathrm{conv}\{\nabla g_i(x), i \in \mathcal{I}(\bar{x})\} \subseteq \widehat{\partial}f(\bar{x})$, where $ \mathrm{conv}$ denotes the convex hull. Then by virtue of the separation theorem, we can show that

$$\begin{align}\widehat{\partial}f(\bar{x}) {}={} \mathrm{conv}\{\nabla g_i(x), i \in \mathcal{I}(\bar{x})\},\end{align}$$

This has been shown for the convex subgradient of the pointwise maximum of a set of convex functions - see for example these lecture notes (Section 3.4).