## The Dual Problem

We consider the problem

$\operatorname{minimize} f(x)$

$$

\text { subject to } x \in X, \quad g_{j}(x) \leq 0, \quad j=1, \ldots, r

$$

where $f: \Re^{n} \mapsto \Re, g_{j}: \Re^{n} \mapsto \Re$ are given functions, and $X$ is a subset of $\Re^{n}$. We also denote

$$

g(x)=\left(g_{1}(x), \ldots, g_{r}(x)\right)

$$

and we write the constraints $g_{j}(x) \leq 0$ compactly as $g(x) \leq 0$. We refer to this problem as the primal problem and we denote by $f^{}$ its optimal value: $$ f^{}=\inf {x \in X \atop g{j}(x) \leq 0, j=1, \ldots . r} f(x)

$$

Unless the opposite is clearly stated, we will assume throughout the following condition.

(Feasibility and Boundedness) There exists at least one feasible solution for the primal problem and the cost is bounded from below, i.e., $-\infty<f^{*}<\infty$.

## Lagrange Multipliers

We want to define a notion of a Lagrange multiplier that is not tied to a specific local or global minimum, and does not assume differentiability of the cost and constraint functions. To this end, we draw motivation from the case where $X=\Re^{n}$, and $f$ and $g_{j}$ are convex and differentiable. Then, if $x^{}$ is a global minimum and a regular point, there exists a vector $\mu^{}=$ $\left(\mu_{1}^{}, \ldots, \mu_{r}^{}\right)$ such that $\mu_{j}^{} \geq 0$ and $\mu_{j}^{} g_{j}\left(x^{}\right)=0$ for all $j$, and $\nabla_{x} L\left(x^{}, \mu^{}\right)=$ 0 or equivalently, $$ f^{}=f\left(x^{}\right)=\min {x \in \Re^{n}} L\left(x, \mu^{}\right),

$$

where $L: \Re^{n+r} \mapsto \Re$ is the Lagrangian function

$$

L(x, \mu)=f(x)+\sum{j=1}^{r} \mu_{j} g_{j}(x)=f(x)+\mu^{\prime} g(x) .

$$

We are thus led to the following definition:

Definition 5.1.1: A vector $\mu^{}=\left(\mu_{1}^{}, \ldots, \mu_{r}^{}\right)$ is said to be a Lagrange multiplier vector (or simply Lagrange multiplier) for the primal problem if $$ \mu_{j}^{} \geq 0, \quad j=1, \ldots, r

$$

and

$$

f^{}=\inf _{x \in X} L\left(x, \mu^{}\right) .

$$

To visualize the definition of a Lagrange multiplier, as well as other concepts related to duality, it is useful to consider hyperplanes in the space of constraint-cost pairs $(g(x), f(x))$ (viewed as vectors in $\left.\Re^{r+1}\right)$. We recall that a hyperplane in $\Re^{r+1}$ is specified by a linear equation involving a nonzero vector $\left(\mu, \mu_{0}\right)$ (called the normal vector of $H)$, where $\mu \in \Re^{r}$ and $\mu_{0} \in \Re$, and by a constant $c$ as follows:

$$

H=\left{(z, w) \mid z \in \Re r, w \in \Re, \mu_{0} w+\mu^{\prime} z=c\right}

$$

Any vector $(\bar{z}, \bar{w})$ that belongs to the hyperplane $H$ specifies the constant

$c$ as

$$

c=\mu_{0} \vec{w}+\mu^{\prime} \bar{z}

$$

Thus the hyperplane with given normal $\left(\mu, \mu_{0}\right)$ that passes through a given vector $(\bar{z}, \bar{w})$ is the set of $(z, w)$ that satisfy the equation

$$

\mu_{0} w+\mu^{\prime} z=\mu_{0} \bar{w}+\mu^{\prime} \bar{z}

$$

This hyperplane defines two halfspaces: the positive halfspace

$$

H^{+}=\left{(z, w) \mid \mu_{0} w+\mu^{\prime} z \geq \mu_{0} \bar{w}+\mu^{\prime} \bar{z}\right}

$$

and the negative halfspace

$$

H^{-}=\left{(z, w) \mid \mu_{0} w+\mu^{\prime} z \leq \mu_{0} \vec{w}+\mu^{\prime} \bar{z}\right}

$$

Hyperplanes with normals $\left(\mu, \mu_{0}\right)$ where $\mu_{0} \neq 0$ are referred to as nonvertical (their normal has a nonzero last component). A nonvertical hyperplane can be normalized by dividing its normal vector by $\mu_{0}$, and assuming this is done, we have $\mu_{0}=1$.

The above definitions will now be used to interpret Lagrange multipliers as normalized nonvertical hyperplanes with a certain orientation to the set of all constraint-cost pairs as $x$ ranges over $X$, i.e., the subset of $\Re^{r+1}$

$$

S={(g(x), f(x)) \mid x \in X}

$$

We have the following lemma, which is graphically illustrated

Visualization Lemma:

(a) The hyperplane with normal $(\mu, 1)$ that passes through a vector $(g(x), f(x))$ intercepts the vertical axis ${(0, w) \mid w \in \Re}$ at the level $L(x, \mu)$

(b) Among all hyperplanes with normal ( $\mu, 1$ ) that contain in their positive halfspace the set $S$ of Eq. (5.1), the highest attained level of interception of the vertical axis is inf $_{x \in X} L(x, \mu)$.

(c) $\mu^{}$ is a Lagrange multiplier (as per Definition 5.1.1) if and only if $\mu^{} \geq 0$ and among all hyperplanes with normal $\left(\mu^{}, 1\right)$ that contain in their positive halfspace the set $S$, the highest attained level of interception of the vertical axis is $f^{}$.

$$

w+\mu^{\prime} z=f(x)+\mu^{\prime} g(x)=L(x, \mu)

$$

The only vector in the vertical axis satisfying this equation is $(0, L(x, \mu))$.

(b) The hyperplane with normal $(\mu, 1)$ that intercepts the vertical axis at level $c$ is the set of vectors $(z, w)$ that satisfy the equation

$$

w+\mu^{\prime} z=c,

$$

and this hyperplane contains $S$ in its positive halfspace if and only if

$$

L(x, \mu)=f(x)+\mu^{\prime} g(x) \geq c, \quad \forall x \in X .

$$

Therefore the maximum point of interception is $c^{*}=\inf _{x \in X} L(x, \mu)$

(c) Follows from the definition of a Lagrange multiplier and part (b).

If a Lagrange multiplier $\mu^{}$ is known, then all optimal solutions $x^{}$ can be obtained by minimizing the Lagrangian $L\left(x, \mu^{}\right)$ over $x \in X$, as indicated in Fig. $5.1 .2(\mathrm{c})$ and shown in the following proposition. However, there may be vectors that minimize $L\left(x, \mu^{}\right)$ over $x \in X$ but do not satisfy the inequality constraints $g(x) \leq 0[\mathrm{cf}$.

Let $\mu^{}$ be a Lagrange multiplier. Then $x^{}$ is a global minimum of the primal problem if and only if $x^{}$ is feasible and $$ x^{}=\arg \min {x \in X} L\left(x, \mu^{}\right), \quad \mu{j}^{} g_{j}\left(x^{*}\right)=0, \quad j=1, \ldots, r

$$

If $x^{}$ is a global minimum, then $x^{}$ is feasible and furthermore,

$$

f^{}=f\left(x^{}\right) \geq f\left(x^{}\right)+\sum_{j=1}^{r} \mu_{j}^{} g_{j}\left(x^{}\right)=L\left(x^{}, \mu^{}\right) \geq \inf {x \in X} L\left(x, \mu^{}\right)

$$

where the first inequality follows from the definition of a Lagrange multiplier $\left(\mu^{} \geq 0\right)$ and the feasibility of $x^{}\left[g\left(x^{}\right) \leq 0\right] .$ Using again the definition of Lagrange multiplier, we have $f^{}=\inf {x \in X} L\left(x, \mu^{*}\right)$, so that equality holds throughout in Eq. (5.3), implying Eq. (5.2).

Conversely, if $x^{}$ is feasible and Eq. (5.2) holds, we have, using also the definition of a Lagrange multiplier, $$ f\left(x^{}\right)=f\left(x^{}\right)+\sum_{j=1}^{r} \mu_{j}^{} g_{j}\left(x^{}\right)=L\left(x^{}, \mu^{}\right)=\min _{x \in X} L\left(x, \mu^{}\right)=f^{} $$ so $x^{}$ is a global minimum. Q.E.D.

## The Weak Duality Theorem

We consider the dual function $q$ defined for $\mu \in \Re^{r}$ by

$$

q(\mu)=\inf _{x \in X} L(x, \mu)

$$

This definition is illustrated in Fig. $5.1 .5$, where $q(\mu)$ is interpreted as the highest point of interception with the vertical axis over all hyperplanes with normal $(\mu, 1)$, which contain the set $S$ in their positive halfspace [compare also with the visualization lemma and Fig. $5.1 .2(\mathrm{a})] .$ The dual problem is

maximize $q(\mu)$ subject to $\mu \geq 0$

and corresponds to finding the maximum point of interception, over all hyperplanes with normal $(\mu, 1)$ where $\mu \geq 0$.

Note that $q(\mu)$ may be equal to $-\infty$ for some $\mu .$ In this case, effectively we have the additional constraint

$$

\mu \in D

$$

in the dual problem, where $D$, called the domain of $q$, is the set of $\mu$ for which $q(\mu)$ is finite:

$$

D={\mu \mid q(\mu)>-\infty}

$$

In fact, we may have $q(\mu)=-\infty$ for all $\mu \geq 0$, in which case the dual optimal value

$$

q^{*}=\sup _{\mu \geq 0} q(\mu)

$$

is equal to $-\infty$ (situations where this can happen will be discussed later). Regardless of the structure of the cost and constraints of the primal problem, the dual problem has nice convexity properties, as shown by the following proposition.

The domain $D$ of the dual function $q$ is convex and $q$ is concave over $D$.

For any $x, \mu, \bar{\mu}$, and $\alpha \in[0,1]$, we have

$$

L(x, \alpha \mu+(1-\alpha) \bar{\mu})=\alpha L(x, \mu)+(1-\alpha) L(x, \bar{\mu}) .

$$

Taking the infimum over all $x \in X$, we obtain

$$

\inf {x \in X} L(x, \alpha \mu+(1-\alpha) \bar{\mu}) \geq \alpha \inf {x \in X} L(x, \mu)+(1-\alpha) \inf _{x \in X} L(x, \bar{\mu})

$$

or

$$

q(\alpha \mu+(1-\alpha) \bar{\mu}) \geq \alpha q(\mu)+(1-\alpha) q(\bar{\mu})

$$

Therefore if $\mu$ and $\bar{\mu}$ belong to $D$, the same is true for $\alpha \mu+(1-\alpha) \bar{\mu}$, so $D$ is convex. Furthermore, $q$ is concave over $D .$

The concavity of $q$ can also be verified by observing that $q$ is defined as the infimum of a collection of the concave (in fact linear) functions $L(x, \cdot)$ parameterized by $x \in X$; such a function is concave

Another important property is that the optimal dual value is always an underestimate of the optimal primal value. This is evident from the geometric interpretation of and is formally shown in the next proposition.

For all $\mu \geq 0$, and $x \in X$ with $g(x) \leq 0$, we have

$$

q(\mu)=\inf {z \in X} L(z, \mu) \leq f(x)+\sum{j=1}^{r} \mu_{j} g_{j}(x) \leq f(x)

$$

So

$$

q^{}=\sup {\mu \geq 0} q(\mu) \leq \inf {x \in X, g(x) \leq 0} f(x)=f^{} .

$$

If $q^{}=f^{}$ we say that there is no duality gap and if $q^{}}$ we say that there is a duality gap. Note that if there exists a Lagrange multiplier $\mu^{}$, the weak duality theorem $\left(q^{} \leq f^{}\right)$ and the definition of a Lagrange multiplier $\left[f^{}=q\left(\mu^{}\right) \leq q^{}\right]$ imply that there is no duality gap. However, the converse is not true. In particular, it is possible that no Lagrange multiplier exists even though there is no duality gap [cf. Fig. $5.1 .4(\mathrm{a})] ;$ in this case the dual problem does not have an optimal solution, as implied by the following proposition.

(a) If there is no duality gap, the set of Lagrange multipliers is equal to the set of optimal dual solutions.

(b) If there is a duality gap, the set of Lagrange multipliers is empty.