Notes | Mechatronic Systems and Laboratory

2.8 General NLP problems

In this section, we shall consider general, nonlinear programming problems with both equality and inequality constraints,

\begin{matrix} minimize & f (x) \\ subject to & g_{i} (x) \leq 0, i = 1, \dots, n_{g} \\ h_{i} (x) = 0, i = 1, \dots, n_{h} \end{matrix}

Before stating necessary and sufficient conditions for such problems, we shall start by revisiting the KKT conditions for inequality constrained problems, based on the method of Lagrange multipliers described in 2.7.

2.8.1 KKT Conditions for Inequality Constrained NLP Problems Revisited

Consider the problem to minimize a function $f (x) for x \in S : = {x \in ℝ^{n_{x}} : g_{i} (x) = 0, i = 1, . . ., n_{g}}$ and suppose that $x ⋆$ is a local minimum point. Clearly, $x ⋆$ is also a local minimum of the inequality constrained problem where the inactive constraints $g_{i} (x) \leq 0, i \notin 𝒜 (x ⋆)$ , have been discarded. Thus, in effect, inactive constraints at $x ⋆$ don’t matter; they can be ignored in the statement of optimality conditions.

On the other hand, active inequality constraints can be treated to a large extent as equality constraints at a local minimum point. In particular, it should be clear that $x ⋆$ is also a local minimum to the equality constrained problem

\begin{matrix} minimize & f (x) \\ subject to & g_{i} (x) = 0, i \in 𝒜 (x ⋆) \end{matrix}

That is, it follows from Theorem 2.13 that, if $x ⋆$ is a regular point, there exists a unique Lagrange multiplier vector $ν ⋆ \in ℝ^{n_{g}}$ such that

\nabla f (x ⋆) + \sum_{i \in 𝒜 (x ⋆)} ν_{i}^{⋆} \nabla g_{i} (x ⋆) = 0

Assigning zero Lagrange multipliers to the inactive constraints, we obtain

\begin{aligned} \nabla f (x ⋆) + \nabla g (x ⋆) ν ⋆ & = 0 \\ ν_{i}^{⋆} & = 0, \forall i \notin 𝒜 (x ⋆) \end{aligned}

This latter condition can be rewritten by means of the following equations:

ν_{i}^{⋆} g_{i} (x ⋆) = 0 \forall i = 1, \dots, n_{g}

The argument showing that $ν ⋆ \geq 0$ is a little more elaborate. By contradiction, assume that $ν_{l} < 0$ for some $l \in 𝒜 (x ⋆)$ . Let $A \in ℝ^{(n_{g} + 1) \times n_{x})}$ be the matrix whose rows are $\nabla f (x ⋆)$ and $\nabla g_{i} (x ⋆), i = 1, . . ., n g$ . Since $x ⋆$ is a regular point, the Lagrange multiplier vector $ν ⋆$ is unique. Therefore, the condition

A^{⊤} y = 0

can only be satisfied by $y ⋆ : = η {(1 ν}^{⋆}) ⊤$ with $η \in ℝ$ . Because $ν_{l} < 0$ , we know by Theorem 2.1 that there exists a direction $\bar{d} \in ℝ^{n_{x}}$ such that $A \bar{d} < 0$ . In other words,

\bar{d} \in ℱ_{0} (x ⋆) \cap 𝒟_{0} (x ⋆) \neq \emptyset

which contradicts the hypothesis that $x ⋆$ is a local minimizer of $f$ on $S$ (see Remark 2.18).

Overall, these results thus constitute the KKT optimality conditions as stated in Theorem 2.10. But although the foregoing development is straightforward, it is somewhat limited by the regularity-type assumption at the optimal solution. Obtaining more general constraint qualifications (see Remark 2.21) requires that the KKT conditions be derived using an alternative approach, e.g., the approach described earlier in Section 2.6. Still, the conversion to equality constrained problem proves useful in many situations, e.g., for deriving second-order sufficiency conditions for inequality constrained NLP problems.

2.8.2 Optimality Conditions for General NLP Problems

We are now ready to generalize the necessary and sufficient conditions given in Theorems 2.10, 2.13, 2.15 and 2.16 to general NLP problems.

Theorem 2.17: First- and Second-Order Necessary Conditions

Let $f : ℝ^{n_{x}} \to ℝ$ , $g_{i} : ℝ^{n_{x}} \to ℝ, i = 1, \dots, n_{g}$ and $h_{i} : ℝ^{n_{x}} \to ℝ$ , be twice continuously differentiable functions on $ℝ^{n_{x}}$ . Consider the problem of minimizing $f (x)$ subject to the constraints $h (x) = 0$ and $g (x) = 0$ . If $x ⋆$ is a local minimum of the optimization problem and is a regular point of the constraints, then there exist unique vectors $ν ⋆$ and $λ ⋆$ such that

\begin{aligned} \nabla f (x ⋆) + \nabla h (x ⋆) λ ⋆ + \nabla g (x ⋆) ν ⋆ & = 0 \\ g (x ⋆) & \leq 0 \\ h (x ⋆) & = 0 \\ g {(x}^{⋆}) ⊤ ν ⋆ & = 0 \end{aligned}

and

y ⊤ (\nabla^{2} f (x ⋆) + \sum_{i = 1}^{n_{g}} ν_{i}^{⋆} \nabla^{2} g_{i} (x ⋆) + \sum_{i = 1}^{n_{h}} λ_{i}^{⋆} \nabla^{2} h_{i} (x ⋆)) y \geq 0

for all $y$ such that $\nabla g_{i} {(x}^{⋆}) ⊤ y = 0, i = 1 \in 𝒜 (x ⋆)$ and $\nabla h {(x}^{⋆}) ⊤ y = 0$

Theorem 2.18: Second-Order Sufficient Conditions

Let $f : ℝ^{n_{x}} \to ℝ$ , $g_{i} : ℝ^{n_{x}} \to ℝ, i = 1, \dots, n_{g}$ and $h_{i} : ℝ^{n_{x}} \to ℝ, i = 1, \dots, n_{h}$ , be twice continuously differentiable functions on $ℝ^{n_{x}}$ . Consider the problem of minimizing $f (x)$ subject to the constraints $h (x) = 0$ and $g (x) = 0$ . If there exists $x ⋆$ , $ν ⋆$ and $λ ⋆$ satisfying the KKT conditions in Theorem 2.17, and

y ⊤ \nabla_{x x}^{2} ℒ (x ⋆, ν ⋆, λ ⋆) y > 0

for all $y \neq 0$ such that

\begin{matrix} \nabla g_{i} {(x}^{⋆}) ⊤ y & = 0 & i \in 𝒜 (x ⋆) with ν_{i}^{⋆} > 0 \\ \nabla g_{i} {(x}^{⋆}) ⊤ y & \leq 0 & i \in 𝒜 (x ⋆) with ν_{i}^{⋆} = 0 \\ \nabla h {(x}^{⋆}) ⊤ y & = 0 \end{matrix}

where $ℒ = f (x) + h {(x)}^{⊤} λ + g {(x)}^{⊤} ν$ , then $x ⋆$ is a strict local minimum.

Likewise, the KKT sufficient conditions given in Theorem 2.11 for convex, inequality constrained problems can be generalized to general convex problems as follows:

Theorem 2.19: KKT Sufficient Conditions for Convex Programs

Let $f : ℝ^{n_{x}} \to ℝ$ , $g_{i} : ℝ^{n_{x}} \to ℝ, i = 1, \dots, n_{g}$ be convex and differentiable functions. Let also $h_{i} : ℝ^{n_{x}} \to ℝ, i = 1, \dots, n_{h}$ be affine functions. Consider the problem to minimize $f (x)$ subject to $x \in S : = {x \in ℝ^{n_{x}} : g (x) \leq 0, h (x) = 0}$ . If $(x ⋆, ν ⋆, λ ⋆)$ satisfies the KKT conditions of Theorem 2.17 then $x ⋆$ is a global minimizer for $f$ on $S$ .

Theorem 2.20: Farkas’ Theorem

Let $A \in ℝ^{m \times n}$ and $c \in ℝ^{n}$ . Then, exactly one of the following two statements holds:

System 1. $\exists x \in ℝ^{n} such that A x \leq 0 and c ⊤ x > 0$ ,
System 2. $\exists y \in ℝ^{n} such that A^{⊤} y = c and y \geq 0$ .

Proof. See, e.g., [6, Theorem 2.4.5] for a proof. □

Farkas’ Theorem is used extensively in the derivation of optimality conditions of (linear and) nonlinear programming problems. A geometrical interpretation of Farkas’ Theorem is shown in Fig. 1.A.1.. If a1, . . . , am denote the rows of A, then system 2 has a solution if c lies in the convex cone generated by a1, . . . , am; On the other hand, system 1 has a solution if the closed convex cone x : Ax 0 and the open half-space x : cTx ¿ 0 have a nonempty intersection.

Corollary 2.1: Gordan’s Theorem

Let $A \in ℝ^{m \times n}$ . Then, exactly one of the following two statements holds:

System 1. $\exists x \in ℝ^{n} such that A x < 0$ ,
System 2. $\exists y \in ℝ^{m} y \neq 0 such that A^{⊤} y = 0 and y \geq 0$ .

Proof. System 1 can be equivalently written as $A x + ρ e$ where $ρ > 0$ is a scalar and $e$ is a vector of $m$ ones. Rewriting this in the form of System 1 in Farkas’ Theorem 2.20, we get $(A e) p$ and $(0, . . ., 0, 1) p > 0$ where $p : = (x ρ)$ . The associated System 2 by Theorem 2.20 states that ${(A e)}^{⊤} {(0, . . ., 0, 1)}^{⊤}$ and $y \geq 0$ for some $y \in ℝ^{m}$ , i.e., $A^{⊤} y = 0$ , $e ⊤ y = 1$ and $y \geq 0$ , which is equivalent to the System 2 of the corollary. □

Lemma 2.4

Let $P$ and $Q$ be two symmetric matrices, such that $Q ≽ 0$ and $P ≻ 0$ on the null space of $Q$ (i.e., $y ⊤ P y > 0 \forall y \neq 0 such that Q y = 0$ ). Then,

\exists \bar{c} > 0 such that P + c Q ≻ 0 \forall c > \bar{c}

Proof. Assume the contrary. Then,

\forall k > 0, \exists x k, ∥ x k ∥ = 1 such that x k ⊤ P x k + k x k ⊤ Q x k \leq 0

Consider a subsequence ${{x}_{k}} 𝒦$ converging to some $\bar{x}$ with $∥ \bar{x} ∥ = 1$ . Dividing the last equation by $k$ , and taking the limit as $k \in 𝒦 \to \infty$ , we obtain

{\bar{x}}^{⊤} Q \bar{x} \leq 0

On the other hand, $Q$ being positive semidefinite, we must have

{\bar{x}}^{⊤} Q \bar{x} \geq 0

Hence ${\bar{x}}^{⊤} Q \bar{x} = 0$ . That is, using the hypothesis, ${\bar{x}}^{⊤} P \bar{x} > 0$ . This contradicts the fact that

{\bar{x}}^{⊤} P \bar{x} + \underset{k \to 0 k \in 𝒦}{limsup} k x k ⊤ Q x k \leq 0

□