Notes | Mechatronic Systems and Laboratory

2.7 Problems with equality constraints

In this section, we shall consider nonlinear programming problems with equality constraints of the form:

\begin{matrix} minimize & f (x) \\ subject to & h_{i} (x) = 0 i = 1, \dots, n_{h} \end{matrix}

Based on the material presented in 2.6, it is tempting to convert this problem into an inequality constrained problem, by replacing each equality constraints $h_{i} (x) = 0$ by two inequality constraints $h_{i}^{+} (x) = h_{i} (x) \leq 0$ and $h_{i}^{-} (x) = - h (x) \leq 0$ . Given a feasible point $\bar{x} \in ℝ^{n_{x}}$ , we would have $h_{i}^{+} (\bar{x}) = h_{i}^{-} (\bar{x}) = 0$ and $\nabla h_{i}^{+} (\bar{x}) = - \nabla h_{i}^{-} (\bar{x})$ . Therefore, there could exist no vector $d$ such that $\nabla h_{i}^{+} (\bar{x}) < 0$ and $\nabla h_{i}^{-} (\bar{x}) < 0$ simultaneously, i.e. $𝒟_{0} (\bar{x}) = \emptyset$ . In other words, the geometric conditions developed in the previous section for inequality constrained problems are satisfied by all feasible solutions and, hence, are not informative. A different approach must therefore be used to deal with equality constrained problems. After a number of preliminary results in 2.7.1, we shall describe the method of Lagrange multipliers for equality constrained problems in 2.7.2.

2.7.1 Preliminaries

An equality constraint $h (x) = 0$ defines a set on $ℝ^{n_{x}}$ , which can be seen as a hypersurface.

When considering $n_{h} \geq 1$ equality constraints ${h_{1} (x), \dots, h_{n_{h}} (x)}$ , their intersection forms a (possibly empty) set $S : = {x \in ℝ^{n_{x}} : h_{i} (x) = 0, i = 1, \dots, n_{h}}$ .

Throughout this section, we shall assume that the equality constraints are differentiable; that is, the set ${S : = x \in ℝ^{n_{x}} : h_{i} (x) = 0, i = 1, \dots, n_{h}}$ is said to be a differentiable manifold (or smooth manifold). Associated with a point on a differentiable manifold is the tangent set at that point. To formalize this notion, we start by defining curves on a manifold. A curve $ξ$ on a manifold $S$ is a continuous application $ξ : ℐ \subset ℝ \mapsto S$ , i.e., a family of points $ξ (t) \in S$ continuously parameterized by $t$ in an interval $ℐ \subset ℝ$ . A curve is said to pass through the point $\bar{x}$ if $\bar{x} = ξ (\bar{t})$ for some $\bar{t} \in ℐ$ ; the derivative of a curve at $\bar{t}$ , provided it exists, is defined as $\dot{ξ} (\bar{t}) : = \lim_{h \to 0} \frac{ξ (\bar{t} + h) - ξ (\bar{t})}{h}$ . A curve is said to be differentiable (or smooth) if a derivative exists for each $t \in I$ .

Definition 2.21: Tangent Set

Let $S$ be a (differentiable) manifold in $ℝ^{n_{x}}$ , and let $\bar{x} \in S$ . Consider the collection of all the continuously differentiable curves on $S$ passing through $\bar{x}$ . Then, the collection of all the vectors tangent to these curves at $\bar{x}$ is said to be the tangent set to $S$ at $\bar{x}$ , denoted by $𝒯 (\bar{x})$ .

If the constraints are regular, in the sense of Definition 2.22 below, then $S$ is (locally) of dimension $n_{x} - n_{h}$ , and $𝒯 (\bar{x})$ constitutes a subspace of dimension $n_{x} - n_{h}$ , called the tangent space.

Definition 2.22: Regular Point (for a Set of Equality Constraints)

Let $h_{i} : ℝ^{n_{x}} \to ℝ, i = 1, \dots, n_{h}$ be differentiable functions on $ℝ^{n_{x}}$ and consider the set $S : = {\bar{x} : h_{i} (x) = 0, i = 1, . . ., n_{h}}$ . A point $x \in S$ is said to be a regular point if the gradient vectors $\nabla h_{i} (x)$ , $i = 1, \dots, n_{h}$ are linearly independent, i.e.,

rank (\nabla h_{1} (\bar{x}) \nabla h_{2} (\bar{x}) \dots \nabla h_{n_{h}} (\bar{x})) = n_{h}

Lemma 2.3: Algebraic Characterization of a Tangent Space

Let $h_{i} : ℝ^{n_{x}} \to ℝ, i = 1, \dots, n_{h}$ be differentiable functions on $ℝ^{n_{x}}$ and consider the set $S : = {\bar{x} : h_{i} (x) = 0, i = 1, . . ., n_{h}}$ . At a regular point $x \in S$ , the tangent space is such that

𝒯 (\bar{x}) = {d : \nabla h {(\bar{x})}^{⊤} d = 0}

Proof. ... □

Recall that $\nabla h ⊤ = \frac{\partial h}{\partial x} \in ℝ^{n_{h} \times n_{x}}$ is the Jacobian of the constraints. Therefore, the tangent directions $d$ must be orthogonal to the gradient of each constraint $\nabla h_{i}$ at $\bar{x}$ . Note also that $\frac{\partial h}{\partial x} d = 0$ defines the kernel of the Jacobian at $\bar{x}$ that is the set $K (\frac{\partial h}{\partial x}) : = {d : \frac{\partial h}{\partial x} d = 0}$ . Since we have assumed that $\bar{x}$ is a regular point, the Jacobian is full rank and for the rank-nullity theorem we have that the dimension of the kernel is $dim (K (\frac{\partial h}{\partial x})) = n_{x} - n_{h}$ that is the dimension of the tangent space.

2.7.2 The Method of Lagrange Multipliers

The idea behind the method of Lagrange multipliers for solving equality constrained NLP problems of the form

\begin{matrix} minimize & f (x) \\ subject to & h_{i} (x) = 0 i = 1, \dots, n_{h} \end{matrix}

is to restrict the search of a minimum on the manifold $S : = {x \in ℝ^{n_{x}} : h_{i} (x) = 0, i = 1, . . ., n_{h}}$ . In other words, we derive optimality conditions by considering the value of the objective function along curves on the manifold S passing through the optimal point.

The following Theorem shows that the tangent space $𝒯 (x ⋆)$ at a regular (local) minimum point $x ⋆$ is orthogonal to the gradient of the objective function at $x ⋆$ . This important fact is illustrated in Fig. ??. in the case of a single equality constraint.

Theorem 2.12: Geometric Necessary Condition for a Local Minimum

Let $f : ℝ^{n_{x}} \to ℝ$ and $h_{i} : ℝ^{n_{x}} \to ℝ, i = 1, \dots, n_{h}$ be continuously differentiable functions on $ℝ^{n_{x}}$ . Suppose that $x ⋆$ is a local minimum point of the problem to minimize $f (x)$ subject to the constraints $h (x) = 0$ . Then, $\nabla f (x ⋆)$ is orthogonal to the tangent space $𝒯 (x ⋆)$ , that is:

ℱ_{0} (x ⋆) \cap 𝒯 (x ⋆) = \emptyset

Proof. By contradiction, assume that there exists a $d \in 𝒯 (x ⋆)$ such that $\nabla f {(x}^{⋆}) ⊤ d \neq 0$ . Let $ξ : ℐ = [- a, a] \to S, a > 0$ be any smooth curve passing through $x ⋆$ with $ξ (0) = x ⋆$ and $\dot{ξ} (0) = d$ . Let also $ϕ$ be the function defined as $ϕ (t) : = f (ξ (t)), \forall t \in I$ . Since $x ⋆$ is a local minimum of f on $S : = {x : h_{i} (x) = 0, i = 1, . . ., n_{h}}$ , by Definition 2.11, we have

\exists η > 0 such that φ (t) = f (ξ (t)) \geq f (x ⋆) = φ (0) \forall t \in ℬ_{η} (0) \cap ℐ

It follows that $t^{⋆} = 0$ is an unconstrained (local) minimum point for $ϕ$ , and

0 = \nabla φ (0) = \nabla f (x ⋆) ⊤ \dot{ξ} (0) = \nabla f (x ⋆) ⊤ d

We thus get a contradiction with the fact that $\nabla f {(x}^{⋆}) ⊤ d \neq 0$ □

Next, we take advantage of the forgoing geometric characterization, and derive first-order necessary conditions for equality constrained NLP problems.

Theorem 2.13: First-Order Necessary Conditions

Let $f : ℝ^{n_{x}} \to ℝ$ and $h_{i} : ℝ^{n_{x}} \to ℝ, i = 1, \dots, n_{h}$ be continuously differentiable functions on $ℝ^{n_{x}}$ . Consider the problem to minimize $f (x)$ subject to the constraints $h (x) = 0$ . If $x ⋆$ is a local minimum and is a regular point of the constraints, then there exists a unique vector $λ ⋆ \in ℝ^{n_{h}}$ such that:

\nabla f (x ⋆) + \nabla h (x ⋆) λ ⋆ = \nabla f (x ⋆) + \sum_{i = 1}^{n_{h}} \nabla h_{i} (x ⋆) λ_{i}^{⋆} = 0

Proof. Since $x ⋆$ is a local minimum of $f$ on $S : = {x \in ℝ^{n_{x}} : h (x) = 0}$ , by Theorem 2.12, we have $ℱ_{0} (x ⋆) \cap 𝒯 (x ⋆) = \emptyset$ , i.e. the system

\nabla f (x ⋆) ⊤ d < 0 \nabla h (x ⋆) ⊤ d = 0

is inconsistent. Consider the following two sets:

\begin{matrix} C_{1} : = {(z_{1}, z 2) \in ℝ^{n_{h} + 1} : z_{1} = \nabla f (x ⋆) ⊤ d, z 2 = \nabla h (x ⋆) ⊤ d} \\ C_{2} : = {(z_{1}, z 2) \in ℝ^{n_{h} + 1} : z_{1} < 0, z 2 = 0 .} \end{matrix}

Clearly, $C_{1}$ and $C_{2}$ are convex and $C_{1} \cap C_{2} = \emptyset$ . Then, by the separation Theorem 2.14, there exists a nonzero vector $(μ, λ) \in ℝ^{n_{h} + 1}$ such that

μ \nabla f (x ⋆) ⊤ d + λ ⊤ [\nabla h (x ⋆) ⊤ d] \geq μ z_{1} + λ ⊤ z 2 \forall d \in ℝ^{n_{x}}, \forall (z_{1}, z 2) \in cl (C_{2})

Letting $z 2 = 0$ and since $z_{1}$ can be made an arbitrarily large negative number, it follows that $μ \geq 0$ ^a ^a Note that if $μ < 0$ then since $z_{1} < 0$ the lower bound on the left-hand side would be positive and arbitrarily large. Also, letting $(z_{1}, z 2) = (0, 0)$ ^b we must have $[μ \nabla f (x ⋆) + \nabla h (x ⋆) λ] ⊤ d \geq 0$ , for each $d \in ℝ^{n_{x}}$ . In particular, letting $d = - [μ \nabla f (x ⋆) + \nabla h (x ⋆) λ]$ , it follows that $- ∥ μ \nabla f (x ⋆) + \nabla h (x ⋆) λ ∥ 2 \geq 0$ , and thus,

μ \nabla f (x ⋆) + \nabla h (x ⋆) λ = 0 with (μ, λ) \neq (0, 0)

Finally, note that $μ > 0$ , for otherwise the above equation would contradict the assumption of linear independence of $\nabla h_{i} (x ⋆), i = 1, \dots, n_{h}$ . The result follows by letting $λ ⋆ : = \frac{1}{μ} λ$ , and noting that the linear independence assumption implies the uniqueness of these Lagrangian multipliers. □

Theorem 2.14: Separation of Two Convex Sets

Let $C_{1}$ and $C_{2}$ be two nonempty, convex set in $ℝ^{n}$ and suppose that $C_{1} \cap C_{2} = \emptyset$ . Then, there exists a hyperplane that separates $C_{1}$ and $C_{2}$ ; that is, there exists a nonzero vector $p \in ℝ^{n}$ such that

p ⊤ x 1 \geq p ⊤ x 2 \forall x 1 \in cl (C_{1}) \forall x 2 \in cl (C_{2})

Remark 2.22: Obtaining Candidate Solution Points

The first-order necessary conditions

\nabla f (x ⋆) + \nabla h (x ⋆) λ ⋆ = 0

together with the constraints

h (x ⋆) = 0

give a total of $n_{x} + n_{h}$ (typically nonlinear) equations in the variables $(x ⋆, λ ⋆)$ . Hence, these conditions are complete in the sense that they determine, at least locally, a unique solution. However, as in the unconstrained case, a solution to the first-order necessary conditions need not be a (local) minimum of the original problem; it could very well correspond to a (local) maximum point, or some kind of saddle point. These consideration are illustrated in Example 2.15 below.

Remark 2.23: Regularity-Type Assumption

It is important to note that for a local minimum to satisfy the foregoing first-order conditions and, in particular, for a unique Lagrange multiplier vector to exist, it is necessary that the equality constraint satisfy a regularity condition. In other word, the first-order conditions may not hold at a local minimum point that is non-regular. An illustration of these considerations is provided in Example 2.16.

There exists a number of similarities with the constraint qualification needed for a local minimizer of an inequality constrained NLP problem to be KKT point; in particular, the condition that the minimum point be a regular point for the constraints corresponds to LICQ (see Remark 2.21).

Remark 2.24: Lagrangian

It is convenient to introduce the Lagrangian $ℒ : ℝ^{n_{x}} \times ℝ^{n_{h}} \to ℝ$ associated with the constrained problem, by adjoining the cost and constraint functions as:

ℒ (x, λ) : = f (x) + λ ⊤ h (x)

Thus, if $x ⋆$ is a local minimum which is regular, the first-order necessary conditions are written as

\begin{matrix} \nabla_{x} ℒ (x ⋆, λ ⋆) = 0 \\ \nabla_{λ} ℒ (x ⋆, λ ⋆) = 0 \end{matrix}

the latter equations being simply a restatement of the constraints. Note that the solution of the original problem typically corresponds to a saddle point of the Lagrangian function.

Example 2.15 (Regular Case). Consider the problem

\begin{aligned} \min_{x \in ℝ^{2}} f (x) & : = x_{1} + x_{2} \\ s.t. g (x) & : = x_{1}^{2} + x_{2}^{2} - 2 = 0 \end{aligned}

Observe first that every feasible point is a regular point for the equality constraint (the point (0,0) being infeasible). Therefore, every local minimum is a stationary point of the Lagrangian function by Theorem 2.13.

The gradient vectors $\nabla f (x)$ and $\nabla h (x)$ are given by

\nabla f (x) = {(\begin{matrix} 1 & 1 \end{matrix})}^{⊤} and \nabla h (x) = {(\begin{matrix} 2 x_{1} & 2 x_{2} \end{matrix})}^{⊤}

so that the first-order necessary conditions read

\begin{aligned} 2 λ x_{1} & = - 1 \\ 2 λ x_{2} & = - 1 \\ x_{1}^{2} + x_{2}^{2} & = 2 \end{aligned}

These three equations can be solved for the three unknowns $x_{1}$ , $x_{2}$ and $λ$ . Two candidate local minimum points are obtained: (i) $x_{1}^{⋆} = x_{2}^{⋆} = - 1, λ^{⋆} = 1$ , and (ii) $x_{1}^{⋆} = x_{2}^{⋆} = 1, λ^{⋆} = - 1$ . These results are illustrated on Fig. 2.8. It can be seen that only the former actually corresponds to a local minimum point, while the latter gives a local maximum point.

Example 2.16 (Non-Regular Case). Consider the problem

\begin{matrix} \min_{x \in ℝ^{2}} & f (x) : = - x_{1} \\ s . t . & h_{1} (x) : = {(1 - x_{1})}^{3} + x_{2} = 0 \\ h_{2} (x) : = {(1 - x_{1})}^{3} - x_{2} = 0 \end{matrix}

As shown by Fig. 1.13., this problem has only one feasible point, namely, $x ⋆ = {(1 0)}^{⊤}$ ; that is, $x ⋆$ is also the unique global minimum of the problem. However, at this point, we have

\nabla f (x ⋆) = (\begin{matrix} - 1 \\ 0 \end{matrix}), \nabla h_{1} (x ⋆) = (\begin{matrix} 0 \\ 1 \end{matrix}) and \nabla h_{2} (x ⋆) = (\begin{matrix} 0 \\ - 1 \end{matrix})

hence the first-order conditions

λ_{1} (\begin{matrix} 0 \\ 1 \end{matrix}) + λ_{2} (\begin{matrix} 0 \\ - 1 \end{matrix}) = (\begin{matrix} 1 \\ 0 \end{matrix})

cannot be satisfied. This illustrates the fact that a minimum point may not be a stationary point for the Lagrangian if that point is non-regular.

pict — Figure 2.8:: Solution of Example 2.15

The following theorem provides second-order necessary conditions for a point to be a local minimum of a NLP problem with equality constraints.

Theorem 2.15: Second-Order Necessary Conditions

Let $f : ℝ^{n_{x}} \to ℝ$ and $h_{i} : ℝ^{n_{x}} \to ℝ, i = 1, \dots, n_{h}$ be twice continuously differentiable functions on $ℝ^{n_{x}}$ . Consider the problem to minimize $f (x)$ subject to the constraints $h (x) = 0$ . If $x ⋆$ is a local minimum and is a regular point of the constraints, then there exists a unique vector $λ ⋆ \in ℝ^{n_{h}}$ such that

\nabla f (x ⋆) + \nabla h (x ⋆) λ ⋆ = 0

and

d ⊤ (\nabla^{2} f (x ⋆) + \sum_{i = 1}^{n_{h}} \nabla^{2} h_{i} (x ⋆) λ_{i}^{⋆}) d \geq 0 \forall d such that \nabla h (x ⋆) ⊤ d = 0

Proof. Note first that $\nabla f (x ⋆) + \nabla h (x ⋆) λ ⋆ = 0$ directly follows from Theorem 2.13. Let $d$ be an arbitrary direction in $𝒯 (x ⋆)$ ; that is, $\nabla h {(x}^{⋆}) ⊤ d = 0$ since $x ⋆$ is a regular point (see Lemma 2.3). Consider any twice differentiable curve $ξ : ℐ = [- a, a] \to S$ , $a > 0$ passing through $x ⋆$ with $ξ (0) = x ⋆$ and $\dot{ξ} (0) = d$ . Let $ϕ$ be the function defined as $ϕ (t) : = f (ξ (t)) \forall t \in ℐ$ . Since $x ⋆$ is a local minimum of $f$ on $S : = {x \in ℝ^{n_{x}} : h (x) = 0}$ , $t^{⋆} = 0$ is an unconstrained (local) minimum point for $ϕ$ . By Theorem 2.4, it follows that

0 \leq \nabla^{2} φ (0)

^a^aNote that ϕ is a scalar function, as a consequence ∇⁡2φ=d2ϕdt2 =ξ˙(0)⊤⁡∇⁡2f(x⋆)ξ˙(0)+∇⁡f(x⋆)⊤⁡ξ¨(0)

Furthermore, differentiating the relation $h {(ξ (t))}^{⊤} λ ⋆ = 0$ twice, we obtain

\dot{ξ} {(0)}^{⊤} (\sum_{i = 1}^{n_{h}} \nabla^{2} h_{i} (x ⋆) λ_{i}^{⋆}) \dot{ξ} (0) + (\nabla h (x ⋆) λ ⋆) ⊤ \ddot{ξ} (0) = 0

Adding the last two equations yields

d ⊤ (\nabla^{2} f (x ⋆) + \sum_{i = 1}^{n_{h}} \nabla^{2} h_{i} (x ⋆) λ_{i}^{⋆}) d \geq 0

and this condition must hold for every $d$ such that $\nabla h (x ⋆)^{⊤} d = 0$ □

It is useful to shed more light on the derivation of the previous proof. Thus, we carry out the derivative of $ψ (t) = h {(ξ (t))}^{⊤} λ ⋆$ explicitly. Note that since the curve $ξ$ is in $S$ we have $h_{i} (ξ (t)) = 0 \forall i = 1, \dots, n_{h} \forall t \in ℐ$ . Hence $ψ (t)$ is identically zero for all $t$ (i.e. $ψ (t) \equiv 0 \forall t \in ℐ$ ). We can expand $ψ$ as $ψ (t) = \sum_{i = 1}^{n_{h}} h_{i} (ξ (t)) λ_{i}^{⋆}$ . The first total derivative is:

\dot{ψ} = \sum_{i = 1}^{n_{h}} \sum_{j = 1}^{n_{x}} \frac{\partial h_{i}}{\partial x_{j}} {\dot{ξ}}_{j} λ_{i}^{⋆} = [\nabla h λ ⋆] ⊤ \dot{ξ} = λ ⊤ \frac{\partial h}{\partial x} \dot{ξ} = 0

We can now differentiate again with respect to time to derive the second total derivative:

\begin{aligned} \ddot{ψ} = \sum_{i = 1}^{n_{h}} \sum_{j = 1}^{n_{x}} \sum_{k = 1}^{n_{x}} \frac{\partial^{2} h_{i}}{\partial x_{k} \partial x_{j}} {\dot{ξ}}_{k} {\dot{ξ}}_{j} λ_{i}^{⋆} + \sum_{i = 1}^{n_{h}} \sum_{j = 1}^{n_{x}} \frac{\partial h_{i}}{\partial x_{j}} {\ddot{ξ}}_{j} λ_{i}^{⋆} = \\ = \sum_{i = 1}^{n_{h}} λ_{i}^{⋆} {\dot{ξ}}^{⊤} \nabla^{2} h_{i} \dot{ξ} + \sum_{i = 1}^{n_{h}} λ_{i}^{⋆} \nabla h_{i}^{⊤} \ddot{ξ} = 0 \end{aligned}

Remember that in $ξ (0) = x ⋆$ and $\dot{ξ} (0) = d$ and the first-order necessary condition hold $\nabla f (x ⋆) = - \nabla h (x ⋆) λ ⋆ = - \sum_{i = 1}^{n_{h}} λ_{i}^{⋆} \nabla h_{i} (x ⋆)$ . Therefore, the second derivative of $ψ$ in $0$ gives:

\ddot{ψ} (0) = \sum_{i = 1}^{n_{h}} λ_{i}^{⋆} d ⊤ \nabla^{2} h_{i} (x ⋆) d - \nabla f {(x}^{⋆}) ⊤ \ddot{ξ} (0) = 0

Hence:

\nabla f {(x}^{⋆}) ⊤ \ddot{ξ} (0) = \sum_{i = 1}^{n_{h}} λ_{i}^{⋆} d ⊤ \nabla^{2} h_{i} (x ⋆) d

Using the last equation, it is easier to follow the proof of Theorem 2.15

Remark 2.25: Eigenvalues in Tangent Space

In the foregoing Theorem, it is shown that the matrix $\nabla x x 2 ℒ (x ⋆, λ ⋆)$ restricted to the subspace $𝒯 (x ⋆)$ plays a key role. Geometrically, the restriction of $\nabla x x 2 ℒ (x ⋆, λ ⋆)$ to $𝒯 (x ⋆)$ corresponds to the projection
$𝒫_{𝒯 (x ⋆)} [\nabla x x 2 ℒ (x ⋆, λ ⋆)]$ .

A vector $y \in 𝒯 (x ⋆)$ is said to be an eigenvector of $𝒫_{𝒯 (x ⋆)} [\nabla x x 2 ℒ (x ⋆, λ ⋆)]$ if there is a real number $μ$ such that:

𝒫_{𝒯 (x ⋆)} [\nabla x x 2 ℒ (x ⋆, λ ⋆)] y = μ y

the corresponding $μ$ is said to be an eigenvalue of $𝒫_{𝒯 (x ⋆)} [\nabla x x 2 ℒ (x ⋆, λ ⋆)]$ ^a ^a These definitions coincide with the usual definitions of eigenvector and eigenvalue for real matrices..

Now, to obtain a matrix representation for $𝒫_{𝒯 (x ⋆)} [\nabla x x 2 ℒ (x ⋆, λ ⋆)]$ , it is necessary to introduce a basis of the subspace $𝒯 (x ⋆)$ , say
$E = [e 1, . . ., e n_{x} - n_{h}]$ ^b ^b Note that $E \in ℝ^{n_{x} \times (n_{x} - n_{h})}$ . Then, the eigenvalues of $𝒫_{𝒯 (x ⋆)} [\nabla x x 2 ℒ (x ⋆, λ ⋆)]$ are the same as those of the $n_{x} - n_{h} \times n_{x} - n_{h}$ matrix $E^{⊤} \nabla x x 2 ℒ (x ⋆, λ ⋆) E$ ; in particular, they are independent of the particular choice of basis $E$ .

Example 2.17 (Regular Case Continued). Consider again the Example 2.15. Two candidate local minimum points, (i) $x_{1}^{⋆} = x_{2}^{⋆} = - 1, λ^{⋆} = 1$ and (ii) $x_{1}^{⋆} = x_{2}^{⋆} = 1, λ^{⋆} = - 1$ , were obtained on application of the first-order necessary conditions. The Hessian matrix of the Lagrangian function is given by

\nabla_{x x}^{2} ℒ (x, λ) = \nabla^{2} f (x) + λ \nabla^{2} h (x) = λ (\begin{matrix} 2 & 0 \\ 0 & 2 \end{matrix})

and a basis of the tangent subspace at a point $x \in 𝒯 (x)$ is

E (x) : = (\begin{matrix} - x_{2} \\ x_{1} \end{matrix})

Therefore,

E^{⊤} \nabla_{x x}^{2} ℒ (x, λ) E = 2 λ (x_{1}^{2} + x_{2}^{2})

Since all the feasible $x_{1}$ and $x_{2}$ must satisfy the constraint $h = x_{1}^{2} + x_{2}^{2} - 2 = 0$ we have:

E^{⊤} \nabla_{x x}^{2} ℒ (x, λ) E = 4 λ

In particular, for the candidate solution point (i), we have

E^{⊤} \nabla_{x x}^{2} ℒ (x ⋆, λ^{⋆}) E = 4 > 0

hence satisfying the second-order necessary conditions (in fact, this point also satisfies the second-order sufficient conditions of optimality discussed hereafter). On the other hand, for the candidate solution point (ii), we get

E^{⊤} \nabla_{x x}^{2} ℒ (x ⋆, λ^{⋆}) E = - 4 < 0

which does not satisfy the second-order necessary requirement, so this point cannot be a local minimum.

The conditions given in Theorems 2.13 and 2.15 are necessary conditions that must hold at each local minimum point. Yet, a point satisfying these conditions may not be a local minimum. The following theorem provides sufficient conditions for a stationary point of the Lagrangian function to be a (local) minimum, provided that the Hessian matrix of the Lagrangian function is locally convex along directions in the tangent space of the constraints.

Theorem 2.16: Second-Order Sufficient Conditions

\begin{matrix} \nabla_{x} ℒ (x ⋆, λ^{⋆}) = \nabla f (x ⋆) + \sum_{i = 1}^{n_{h}} λ_{i}^{⋆} \nabla h_{i} (x ⋆) = 0 \\ \nabla_{λ} ℒ (x ⋆, λ^{⋆}) = h (x ⋆) = 0 \end{matrix}

and

y ⊤ \nabla_{x x}^{2} ℒ (x ⋆, λ ⋆) y > 0 \forall y \neq 0 such that \nabla h (x ⋆) ⊤ y = 0

where $ℒ (x, λ) = f (x) + λ ⊤ h (x)$ , then $x ⋆$ is a strict local minimum.

Proof. Consider the augmented Lagrangian function

\bar{ℒ} (x, λ) = f (x) + λ ⊤ h (x) + \frac{c}{2} ∥ h (x) ∥^{2}

where $c$ is a scalar. We have

\begin{matrix} \nabla_{x} \bar{ℒ} (x, λ) & = \nabla_{x} ℒ (x, \bar{λ}) \\ \nabla_{x x}^{2} \bar{ℒ} (x, λ) & = \nabla_{x x}^{2} ℒ (x, \bar{λ}) + c \nabla h {(x)}^{⊤} \nabla h (x) \end{matrix}

where $\bar{λ} = λ + c h (x)$ . Since $(x ⋆, λ ⋆)$ satisfy the sufficient conditions and by Lemma 2.4, we obtain

\nabla_{x} \bar{ℒ} (x ⋆, λ ⋆) = 0 and \nabla_{x x}^{2} \bar{ℒ} (x ⋆, λ ⋆) ≽ 0

for sufficiently large $c$ . $\bar{ℒ}$ being definite positive at $(x ⋆, λ ⋆)$ ,

\begin{aligned} \exists ϱ > 0, δ > 0 such that \bar{ℒ} (x, λ ⋆) \geq \bar{ℒ} (x ⋆, λ ⋆) + \frac{ϱ}{2} ∥ x - x ⋆ ∥ 2 \\ for ∥ x - x ⋆ ∥ < δ \end{aligned}

Finally, since $\bar{ℒ} (x ⋆, λ ⋆) = f (x)$ when $h (x) = 0$ , we get

f (x) \geq f (x ⋆) + \frac{ϱ}{2} ∥ x - x ⋆ ∥ 2 if h (x) = 0, ∥ x - x ⋆ ∥ < δ

i.e., $x ⋆$ is a strict local minimum. □

Example 2.18. Consider the problem

\begin{matrix} \min_{x \in ℝ^{3}} f (x) : = - x_{1} x_{2} - x_{1} x_{3} - x_{2} x_{3} \\ s . t . h (x) : = x_{1} + x_{2} + x_{3} - 3 = 0 \end{matrix}

The first-order conditions for this problem are

\begin{matrix} - (x_{2} + x_{3}) + λ = 0 \\ - (x_{1} + x_{3}) + λ = 0 \\ - (x_{1} + x_{2}) + λ = 0 \end{matrix}

together with the equality constraint. It is easily checked that the point $x_{1}^{⋆} = x_{2}^{⋆} = x_{3}^{⋆} = 1$ , $λ^{⋆} = 2$ satisfies these conditions. Moreover,

\nabla_{x x}^{2} ℒ (x ⋆, λ ⋆) = \nabla^{2} f (x ⋆) = (\begin{matrix} 0 & - 1 & - 1 \\ - 1 & 0 & - 1 \\ - 1 & - 1 & 0 \end{matrix})

and a basis of the tangent space to the constraint $h (x) = 0$ at $x ⋆$ is

E : = (\begin{matrix} 0 & 2 \\ 1 & - 1 \\ - 1 & - 1 \end{matrix})

We thus obtain

E ⊤ \nabla_{x x}^{2} ℒ (x ⋆, λ^{⋆}) E = (\begin{matrix} 2 & 0 \\ 0 & 2 \end{matrix})

which is clearly a definite positive matrix. Hence, $x ⋆$ is a strict local minimum of (1.18). (Interestingly enough, the Hessian matrix of the objective function itself is indefinite at $x ⋆$ in this case.)

We close this section by providing insight into the Lagrange multipliers.

Remark 2.26: A first Interpretation of the Lagrange Multipliers

The concept of Lagrange multipliers allows to adjoin the constraints to the objective function. That is, one can view constrained optimization as a search for a vector $x ⋆$ at which the gradient of the objective function is a linear combination of the gradients of constraints.

Another insightful interpretation of the Lagrange multipliers is as follows. Consider the set of perturbed problems $v^{⋆} (y) : = min {f (x) : h (x) = y}$ . Suppose that there is a unique regular solution point for each $y$ , and let $ξ ⋆ (y) : = arg min {f (x) : h (x) = y}$ denote the evolution of the optimal solution point as a function of the perturbation parameter $y$ . Clearly,

v (0) = f (x ⋆) and ξ ⋆ (0) = x ⋆

Moreover, since $h (ξ ⋆ (y)) = y$ for each $y$ , we have:

\nabla_{y} h (ξ ⋆ (y)) = \frac{d h}{d y} = 1 = \sum_{i = 1}^{n_{x}} \frac{\partial h}{\partial x_{j}} \frac{d ξ_{j}^{⋆}}{d y} = \nabla h^{⊤} \frac{d ξ}{⋆}

Denoting by $λ^{⋆}$ the Lagrange multiplier associated to the constraint $h (x) = 0$ in the original problem, we have

\frac{d v^{⋆}}{d y} |_{y = 0} = \nabla f {(x}^{⋆}) ⊤ \frac{d ξ}{⋆} = - λ^{⋆} \nabla h {(x}^{⋆}) ⊤ \frac{d ξ}{⋆} = - λ^{⋆}

Therefore, the Lagrange multipliers $λ^{⋆}$ can be interpreted as the sensitivity of the objective function $f$ with respect to the constraint $h$ . Said differently, $λ^{⋆}$ indicates how much the optimal cost would change, if the constraints were perturbed. This interpretation extends straightforwardly to NLP problems having inequality constraints. The Lagrange multipliers of an active constraints $g (x) \leq 0$ , say $ν^{⋆} > 0$ , can be interpreted as the sensitivity of $f (x ⋆)$ with respect to a change in that constraints, as $g (x) \leq y$ ; in this case, the positivity of the Lagrange multipliers follows from the fact that by increasing $y$ , the feasible region is relaxed, hence the optimal cost cannot increase. Regarding inactive constraints, the sensitivity interpretation also explains why the Lagrange multipliers are zero, as any infinitesimal change in the value of these constraints leaves the optimal cost value unchanged.

Remark 2.27: A second Interpretation of the Lagrange Multipliers: Mechanical Analogy

It is possible to introduce an interesting physical interpretation of the KKT condition and its Lagrange multipliers. Consider a ball on a hilly terrain where the elevation of the terrain is described by the function $h = f (x)$ where $x = {[\begin{matrix} x_{1} & x_{2} \end{matrix}]}^{⊤}$ are the $x$ and $y$ coordinates. The gravitational potential is $V = m g f (x)$ , we use normalized units so that $V = f (x)$ . In the unconstrained case the equilibrium point is a stationary point for the potential that is $\nabla V (x ⋆) = 0$ . The generalized force, due to gravity, acting on the ball in a generic point $x$ is $F = - \nabla V (x) = - \nabla f (x)$ . Note that the motion along the vertical axis is completely determined by $x$ (i.e. the system has two degrees of freedom). An equality constraint can be seen as a rail on which the ball is forced to slide. The rail is modeled as a curve in the form $h (x) = 0$ . Such a constraint exert a generalized reaction force at each point normal to the curve, hence in the direction of $\nabla h (x)$ . The magnitude of this force depends on how much the gravitational force is pushing against the constraint. The static equilibrium condition is $- \nabla f (x ⋆) = λ^{⋆} \nabla h (x ⋆)$ . On the other hand, inequality constraints can be seen as barriers, the ball is forced to lie inside these barriers. Consider the inequality constraint $g (x) \leq 0$ . When the constraint is inactive, it is not exerting any force on the ball and hence it is not affecting the equilibrium equation. When the constraint is active, the direction of the force exerted is $\nabla g (x)$ .Compared to rails, barriers are unilateral constraints and can only exert forces in one sense. This is the physical interpretation of the nonnegativity of its associated Lagrangian multipliers $ν \geq 0$ . The equilibrium equation in presence of both equality and inequality constraints become the first KKT condition. This interpretation is illustrated in Figure 2.9. Gravitational and reaction forces are $F = \nabla f (x ⋆)$ , $R g = - ν^{⋆} \nabla g (x ⋆)$ , $R h = - λ^{⋆} \nabla h (x ⋆)$ . The force balance is therefore the KKT stationary condition with reversed sign : $F + R g + R h = 0$