Notes | Mechatronic Systems and Laboratory

2.6 Problems with inequality constraints

In practice, few problems can be formulated as unconstrained programs. This is because the feasible region is generally restricted by imposing constraints on the optimization variables. In this section, we first present theoretical results for the problem that is to minimize $f (x)$ , subject to $x \in S$ for a general set $S$ (geometric optimality conditions). Then, we let $S$ be more specifically defined as :

Definition 2.16: Inequality Constrained Program

\begin{matrix} minimize & f (x) \\ subject to & g (x) \leq 0 \end{matrix}

The feasible region of the NLP is defined by a set of nonlinear inequalities $g_{i} (x) \leq 0$ , for the Inequality Constrained Program we derive the Karush-Kuhn-Tucker (KKT) conditions of optimality in the following.

2.6.1 Geometric Optimality Conditions

Definition 2.17: Feasible Direction

Let $S$ be a nonempty set in $ℝ^{n_{x}}$ . A vector $d \in ℝ^{n_{x}}$ , $d \neq 0$ , is said to be a feasible direction at $\bar{x} \in c l (S)$ ^a^acl(S) indicates the closure of the set S, that is the union of S with all of its limit points if

\exists δ > 0 such that \bar{x} + η d \in S \forall η \in (0, δ)

Moreover, the cone of feasible directions at $\bar{x}$ , denoted by $𝒟 (\bar{x})$ , is given by

𝒟 (\bar{x}) : = {d \neq 0 : \exists δ > 0 such that \bar{x} + η d \in S \forall η \in (0, δ)}

From the Definition 2.17 and from Lemma 2.1, it is clear that a small movement from $\bar{x}$ along a direction $d \in 𝒟 (\bar{x})$ leads to feasible points, whereas a similar movement along a direction $d \in ℱ_{0} (\bar{x})$ leads to solutions of improving objective value (see the definition 2.15). As shown in theorem ??, a (geometric) necessary condition for local optimality is that: ”Every improving direction is not a feasible direction”. This fact is illustrated in figure ?? where both the half-space $ℱ_{0} (\bar{x})$ and the cone $𝒟 (\bar{x})$ (see Definition 2.3) are translated from the origin to $\bar{x}$ for clarity.

Theorem 2.9: Geometric Necessary Condition for a Local Minimum

Let $S$ be a nonempty set in $ℝ^{n_{x}}$ and let $f : ℝ^{n_{x}} \to ℝ$ be a differentiable function. Suppose that $\bar{x}$ is a local minimizer of the problem to minimize $f (x)$ subject to $x \in S$ . Then, $ℱ_{0} (\bar{x}) \cap 𝒟 (\bar{x}) = \emptyset$ .

Proof. By contradiction, suppose that there exists a vector $d \in ℱ_{0} (\bar{x}) \cap 𝒟 (\bar{x})$ , $d \neq 0$ . Then, by lemma 2.1,

\exists δ_{1} > 0 such that f (\bar{x} + η d) < f (\bar{x}) \forall η \in (0, δ_{1})

Moreover, by the definition 2.17,

\exists δ_{2} > 0 such that \bar{x} + η d \in S \forall η \in (0, δ_{2})

Hence,

\exists x \in ℬ_{η} (\bar{x}) \cap S such that f (\bar{x} + η d) < f (\bar{x})

for every $η \in (0, \min {δ_{1}, δ_{2}})$ , which contradicts the assumption that $\bar{x}$ is a local minimum of $f$ on $S$ (see the definition 2.11). □

2.6.2 KKT Conditions

We now specify the feasible region as

S : = {x : g_{i} (x) \leq 0 \forall i = 1, \dots, n_{g}}

where $g_{i} : ℝ^{n_{x}} \to ℝ, i = 1, \dots, n_{g}$ , are continuous functions. In the geometric optimality condition given by Theorem 2.12, $𝒟 (\bar{x})$ is the cone of feasible directions. From a practical viewpoint, it is desirable to convert this geometric condition into a more usable condition involving algebraic equations. As Lemma 2.2 indicates, we can define a cone $𝒟_{0} (\bar{x})$ in terms of the gradients of the active constraints at $\bar{x}$ , such that $𝒟_{0} (\bar{x}) \subseteq 𝒟 (\bar{x})$ . For this, we need the following:

Definition 2.18: Active Constraint, Active Set

Let $g_{i} : ℝ^{n_{x}} \to ℝ, i = 1, \dots, n_{g}$ , and consider the set
$S : = {x : g_{i} (x) \leq 0, i = 1, \dots, n_{g}}$ . Let $\bar{x} \in S$ be a feasible point. For each $i = 1, \dots, n_{g}$ , the constraint $g_{i}$ is said to be active or binding at $\bar{x}$ if $g_{i} (\bar{x}) = 0$ ; it is said to be inactive at $\bar{x}$ if $g_{i} (\bar{x}) < 0$ . Moreover,

𝒜 (\bar{x}) : = {i : g_{i} (\bar{x}) = 0}

denotes the set of active constraints at $\bar{x}$ .

Lemma 2.2: Algebraic Characterization of a Feasible Direction

Let $g_{i} : ℝ^{n_{x}} \to ℝ, i = 1, \dots, n_{g}$ be differentiable functions, and consider the set $S : = {x : g_{i} (x) \leq 0, i = 1, \dots, n_{g}}$ . For any feasible point $\bar{x} \in S$ , we have

𝒟_{0} (\bar{x}) : = {d : \nabla g_{i} {(\bar{x})}^{⊤} d < 0 \forall i \in 𝒜 (\bar{x})} \subseteq 𝒟 (\bar{x})

Proof. Suppose $𝒟_{0} (\bar{x})$ is nonempty, and let $d \in 𝒟_{0} (\bar{x})$ . Since $\nabla g_{i}^{⊤} (\bar{x}) d < 0$ for each $i \in 𝒜 (\bar{x})$ , then by Lemma 2.3, $d$ is a descent direction for $g_{i}$ at $\bar{x}$ , i.e.,

\exists δ_{2} > 0 such that g_{i} (\bar{x} + η d) < g_{i} (\bar{x}) = 0 \forall η \in (0, δ_{2}), \forall i \in 𝒜 (\bar{x})

Furthermore, since $g_{i} (\bar{x}) < 0$ and $g_{i}$ is continuous at $\bar{x}$ (since it is differentiable) for each $i \notin 𝒜 (\bar{x})$ ,

\exists δ_{1} > 0 such that g_{i} (\bar{x} + η d) < 0 \forall η \in (0, δ_{1}), \forall i \notin 𝒜 (\bar{x})

Furthermore, overall, it is clear that the points $\bar{x} + η d$ are in $S$ for all $η \in (0, \min {δ_{1}, δ_{2}})$ . Hence, by Definition 2.17, $d \in 𝒟 (\bar{x})$ . □

Remark 2.18

This Lemma together with Theorem 2.12 directly leads to the result that $ℱ_{0} (\bar{x}) \cap 𝒟_{0} (\bar{x}) = \emptyset$ for any local solution point $\bar{x}$ , i.e.,

\arg \min {f (x) : x \in S} \subset {x \in ℝ^{n_{x}} : ℱ_{0} (x) \cap 𝒟_{0} (x) = \emptyset}

The foregoing geometric characterization of local solution points applies equally well to either interior points
$int (S) : = {x \in ℝ^{n_{x}} : g_{i} (x) < 0, \forall i = 1, \dots, n_{g}}$ or boundary points being at the boundary of the feasible domain. At an interior point, in particular, any direction is feasible, and the necessary condition $ℱ_{0} (\bar{x}) \cap 𝒟_{0} (\bar{x}) = \emptyset$ reduces to $\nabla f (\bar{x}) = 0$ , which gives the same condition as in unconstrained optimization (see Theorem 2.3).

Note also that there are several cases where the condition $ℱ_{0} (\bar{x}) \cap 𝒟_{0} (\bar{x}) = \emptyset$ is satisfied by non-optimal points. In other words, this condition is necessary but not sufficient for a point $\bar{x}$ to be a local minimum of $f$ on $S$ . For instance, any point $\bar{x}$ with $\nabla g_{i} (\bar{x}) = 0$ for some $i \in 𝒜 (\bar{x})$ trivially satisfies the condition $ℱ_{0} (\bar{x}) \cap 𝒟_{0} (\bar{x}) = \emptyset$ . Another example is given below.

Example 2.11. Consider the problem

\begin{aligned} \min_{x \in ℝ^{2}} f (x) & : = x_{1}^{2} + x_{2}^{2} \\ s.t. & g_{1} (x) : = x_{1} \leq 0 \\ g_{2} (x) : = - x_{1} \leq 0 \end{aligned}

Clearly, this problem is convex and $x ⋆ = {(0, 0)}^{⊤}$ is the unique global minimum.

Now, let $\bar{x}$ be any point on the line $𝒞 : = {x : x_{1} = 0}$ . Both constraints $g_{1}$ and $g_{2}$ are active at $\bar{x}$ , and we have $\nabla g_{1} (\bar{x}) = - \nabla g_{2} (\bar{x}) = {(1, 0)}^{⊤}$ . Therefore, no direction $d \neq 0$ can be found such that $\nabla g_{1} {(\bar{x})}^{⊤} d < 0$ and $\nabla g_{2} {(\bar{x})}^{⊤} d < 0$ simultaneously, i.e., $𝒟_{0} (\bar{x}) = \emptyset$ . In turn, this implies that $ℱ_{0} (\bar{x}) \cap 𝒟_{0} (\bar{x}) = \emptyset$ is trivially satisfied for any point on $𝒞$ .

On the other hand, observe that the condition $ℱ_{0} (\bar{x}) \cap 𝒟 (\bar{x}) = \emptyset$ in Theorem 2.9excludes all the points on $𝒞$ , but the origin, since a feasible direction at $\bar{x}$ is given, e.g., by $d = {(0, 1)}^{⊤}$ .

We now reduce the geometric necessary optimality condition $ℱ_{0} (\bar{x}) \cap 𝒟_{0} (\bar{x}) = \emptyset$ to a statement in terms of the gradients of the objective function and of the active constraints. The resulting first-order optimality conditions are known as the Karush-Kuhn-Tucker (KKT) necessary conditions. Beforehand, we introduce the important concepts of a regular point and of a KKT point.

Definition 2.19: Regular Point (for a Set of Inequality Constraints)

Let $g_{i} : ℝ^{n_{x}} \to ℝ, i = 1, \dots, n_{g}$ be differentiable functions on $ℝ^{n_{x}}$ and consider the set $S : = {x \in ℝ^{n_{x}} : g_{i} (x) \leq 0, i = 1, \dots, n_{g}}$ . A point $\bar{x} \in S$ is said to be a regular point if the gradient vectors $\nabla g_{i} (\bar{x}), i \in 𝒜 (\bar{x})$ are linearly independent:

rank (\nabla g_{i} (\bar{x}), i \in 𝒜 (\bar{x})) = | 𝒜 (\bar{x}) |

Definition 2.20: KKT Point

Let $f : ℝ^{n_{x}} \to ℝ$ and $g_{i} : ℝ^{n_{x}} \to ℝ, i = 1, \dots, n_{g}$ be differentiable functions. Consider the problem to minimize $f (x)$ subject to $g_{i} (x) \leq 0, i = 1, \dots, n_{g}$ . If a point $(\bar{x}, \bar{ν}) \in ℝ^{n_{x}} \times ℝ^{n_{g}}$ satisfies the conditions:

\begin{align} \nabla f (\bar{x}) + \nabla g (\bar{x}) \bar{ν} & = \nabla f (\bar{x}) + \sum_{i = 1}^{n_{g}} \nabla g_{i} (\bar{x}) {\bar{ν}}_{i} = 0 & (2.4a) \\ \bar{ν} & \geq 0 & (2.4b) \\ g (\bar{x}) & \leq 0 & (2.4c) \\ {\bar{ν}}^{⊤} g (\bar{x}) & = 0 & (2.4d) \end{align}

then $(\bar{x}, \bar{ν})$ is said to be a KKT point ^a ^a Note that $\nabla g = {\frac{\partial g}{\partial x}}^{⊤} = [\nabla g_{1} \dots \nabla g_{n_{g}}]$ .

Remark 2.19

The scalars $ν_{i}, i = 1, \dots, n_{g}$ are called the Lagrange multipliers. The condition 2.4a is called stationarity condition. The condition 2.4b is referred as dual feasibility (DF) condition; the condition 2.4c, i.e., the requirement that $\bar{x}$ be feasible, is called the primal feasibility (PF) condition; finally, the condition 2.4d is called the complementarity slackness (CS) condition ^a

{\bar{ν}}_{i} g_{i} (\bar{x}) = 0 for i = 1, \dots, n_{g}

Theorem 2.10: KKT Necessary Conditions

Let $f : ℝ^{n_{x}} \to ℝ$ and $g_{i} : ℝ^{n_{x}} \to ℝ, i = 1, \dots, n_{g}$ be differentiable functions. Consider the problem to minimize $f (x)$ subject to $g_{i} (x) \leq 0, i = 1, \dots, n_{g}$ . If $x ⋆$ is a local minimum and a regular point of the constraints, then there exists a unique vector $ν ⋆$ such that $(x ⋆, ν ⋆)$ is a KKT point.

Proof. Since $x ⋆$ solves the problem, then there exists no direction $d \in ℝ^{n_{x}}$ such that $\nabla f {(x}^{⋆}) ⊤ d < 0$ and $\nabla g_{i} {(x}^{⋆}) ⊤ d < 0$ , $\forall i \in 𝒜 (x ⋆)$ simultaneously (see Remark 2.18). Let $A \in ℝ^{(| 𝒜 (x ⋆}) | + 1) \times n_{x}$ be the matrix whose rows are $\nabla f {(\bar{x})}^{⊤}$ and $\nabla g_{i} {(\bar{x})}^{⊤}$ , $i \in 𝒜 (x ⋆)$ . Clearly, the statement ${\exists d \in ℝ^{n_{x}} : A d < 0}$ is false and, by Corollary 2.1, there exists a nonzero vector $p \geq 0$ in $ℝ^{| 𝒜 (x ⋆}) | + 1$ such that $A^{⊤} p = 0$ . Denoting the components of $p$ by $u_{0}$ and $u_{i}$ for $i \in 𝒜 (x ⋆)$ , we get:

u_{0} \nabla f (x ⋆) + \sum_{i \in 𝒜 (x ⋆)} u_{i} \nabla g_{i} (x ⋆) = 0

where $u_{0} \geq 0$ and $u_{i} \geq 0$ for $i \in 𝒜 (x ⋆)$ and $(u_{0}, u 𝒜 (x ⋆)) \neq (0, 0)$ (here $u 𝒜 (x ⋆)$ is the vector whose components are the $u_{i}$ ’s for $i \in 𝒜 (x ⋆)$ ). Letting $u_{i} = 0$ for $i \notin 𝒜 (x ⋆)$ , we then get the conditions:

\begin{aligned} u_{0} \nabla f (x ⋆) + \nabla g (x ⋆) u & = u_{0} \nabla f (x ⋆) + \sum_{i = 1}^{n_{g}} u_{i} \nabla g_{i} (x ⋆) = 0 \\ u ⊤ g (x ⋆) & = 0 \\ (u_{0}, u) & \geq (0, 0) \\ (u_{0}, u) & \neq (0, 0) \end{aligned}

where $u$ is the vector whose components are $u_{i}$ for $i = 1, \dots, n_{g}$ . Note that $u_{0} \neq 0$ , since otherwise the assumption of linear independence of the active constraints at $x ⋆$ would be violated ^a ^a Note that if $u_{0} = 0$ then $u \neq 0$ and by Corollary 2.1we have:

\sum_{i \in 𝒜 (x ⋆)} u_{i} \nabla g_{i} (x ⋆) = 0

with at least one $u_{i} \neq 0$ thus violating the linear independence assumption. . Then, letting $ν ⋆ = \frac{1}{u_{0}} u$ , we obtain that $(x ⋆, ν ⋆)$ is a KKT point. □

Remark 2.20

One of the major difficulties in applying the foregoing result is that we do not know a priori which constraints are active and which constraints are inactive, i.e., the active set is unknown. Therefore, it is necessary to investigate all possible active sets for finding candidate points satisfying the KKT conditions. This is illustrated in the example below.

Example 2.12 (Regular Case). Consider the problem

\begin{matrix} \min_{x \in ℝ^{3}} f (x) : & = \frac{1}{2} (x_{1}^{2} + x_{2}^{2} + x_{3}^{2}) \\ s . t . & g_{1} (x) : = x_{1} + x_{2} + x_{3} + 3 \leq 0 \\ g_{2} (x) : = x_{1} \leq 0 \end{matrix}

Note that every feasible point is regular since $\nabla g_{1} = {[1 1 1]}^{⊤}$ and $\nabla g_{2} = {[1 0 0]}^{⊤}$ , so $x ⋆$ must satisfy the stationarity conditions:

\begin{aligned} x_{1}^{⋆} + ν_{1}^{⋆} + ν_{2}^{⋆} & = 0 \\ x_{2}^{⋆} + ν_{1}^{⋆} & = 0 \\ x_{3}^{⋆} + ν_{1}^{⋆} & = 0 \end{aligned}

Four cases can be distinguished:

The constraints $g_{1}$ and $g_{2}$ are both inactive, i.e. $x_{1}^{⋆} + x_{2}^{⋆} + x_{3}^{⋆} < - 3$ , $x_{1}^{⋆} < 0$ , and $ν_{1}^{⋆} = ν_{2}^{⋆} = 0$ . From the latter together with the dual feasibility conditions, we get $x_{1}^{⋆} = x_{2}^{⋆} = x_{3}^{⋆} = 0$ , hence contradicting the former primal feasibility condition.
The constraint $g_{1}$ is inactive, while $g_{2}$ is active, i.e. $x_{1}^{⋆} + x_{2}^{⋆} + x_{3}^{⋆} < - 3$ , $x_{1}^{⋆} = 0$ , $ν_{1}^{⋆} = 0$ and $ν_{2}^{⋆} \geq 0$ . From the latter, together with the stationarity condition, we get $x_{2}^{⋆} = x_{3}^{⋆} = 0$ , hence contradicting the former once again.
The constraint $g_{1}$ is active, while $g_{2}$ is inactive, i.e. $x_{1}^{⋆} + x_{2}^{⋆} + x_{3}^{⋆} = - 3$ , $x_{1}^{⋆} < 0$ , $ν_{1}^{⋆} \geq 0$ and $ν_{2}^{⋆} = 0$ . Then, the point $(x ⋆, ν ⋆)$ such that $x_{1}^{⋆} = x_{2}^{⋆} = x_{3}^{⋆} = - 1$ , $ν_{1}^{⋆} = 1$ and $ν_{2}^{⋆} = 0$ is a KKT point.
The constraints $g_{1}$ and $g_{2}$ are both active, i.e. $x_{1}^{⋆} + x_{2}^{⋆} + x_{3}^{⋆} = - 3$ , $x_{1}^{⋆} = 0$ , and $ν_{1}^{⋆}, ν_{2}^{⋆} = 0$ . Then, we obtain $x_{2}^{⋆} = x_{3}^{⋆} = - \frac{3}{2}$ , $ν_{1}^{⋆} = \frac{3}{2}$ , and $ν_{2}^{⋆} = - \frac{3}{2}$ , hence contradicting the dual feasibility condition $ν_{2}^{⋆} \geq 0$ .

Overall, there is a unique candidate for a local minimum. Yet, it cannot be concluded as to whether this point is actually a global minimum or even a local minimum.

Remark 2.21: Constraint Qualification

It is very important to note that for a local minimum $x ⋆$ to be a KKT point, an additional condition must be placed on the behaviour of the constraint, i.e., not every local minimum is a KKT point, such a condition is known as a constraint qualification. In Theorem 2.10 it is shown that one possible constraint qualification is that $x ⋆$ be a regular point, which is the well known linear independence constraint qualification (LICQ). A weaker constraint qualification (i.e., implied by LICQ) known as the Mangasarian-Fromovitz constraint qualification (MFCQ) requires that there exits (at least) one direction $d \in 𝒟_{0} (x ⋆)$ , i.e. such that $\nabla g_{i} (x ⋆) ⊤ d < 0$ , for each $i \in 𝒜 (x ⋆)$ . Note, however, that the Lagrange multipliers are guaranteed to be unique if LICQ holds (as stated in Theorem 2.10),while this uniqueness property may be lost under MFCQ.

The following example illustrates the necessity of having a constraint qualification for a KKT point to be a local minimum point of an NLP.

Example 2.13 (Non-Regular Case). Consider the problem

\begin{aligned} \min_{x \in ℝ^{2}} f (x) : & = x_{1}^{2} + {(x_{2} - 1)}^{2} \\ s.t. & g_{1} (x) : = x_{2}^{3} - 2 x_{1} \leq 0 \\ g_{2} (x) : = x_{2}^{3} + 2 x_{1} \leq 0 \end{aligned}

The feasible region is shown in fig. 2.7below. Note that a minimum point is $x ⋆ = {(0, 0)}^{⊤}$ . The stationary condition relative to variable $x_{2}$ computed at $x ⋆$ reads:

- 2 = 0

It is readily seen that this condition cannot be met at the local minimum point $x ⋆$ . In other words, the KKT conditions are not necessary in this example. This is because no constraint qualification can hold at $x ⋆$ . In particular, $x ⋆$ not being a regular point, LICQ does not hold. Moreover, the set $𝒟_{0} (x ⋆)$ being empty (the direction $d = {(0, - 1)}^{⊤}$ gives $\nabla g_{1} (x ⋆) ⊤ d = \nabla g_{2} (x ⋆) ⊤ d = 0$ while any other direction induces a violation of either one of the constraints), MFCQ does not hold at $x ⋆$ either.

pict — Figure 2.7:: Solution of Example 2.13

The following theorem provides a sufficient condition under which any KKT point of an inequality constrained NLP problem is guaranteed to be a global minimum of that problem.

Theorem 2.11: KKT Sufficient Conditions

Let $f : ℝ^{n_{x}} \to ℝ$ and $g_{i} : ℝ^{n_{x}} \to ℝ, i = 1, \dots, n_{g}$ , be convex and differentiable functions. Consider the problem to minimize $f (x)$ subject to $g (x) \leq 0$ . If $(x ⋆, ν ⋆)$ is a KKT point, then $x ⋆$ is a global minimum of that problem.

Proof. Consider the function $ℒ (x) : = f (x) + \sum_{i = 1}^{n_{g}} ν_{i}^{⋆} g_{i} (x)$ . Since $f$ and $g_{i}$ , $i = 1, \dots, n_{g}$ are convex functions and $ν_{i} \geq 0$ , $ℒ$ is also convex ^a ^a The sum of convex functions is a convex function, a positive scalar times a convex function is again a convex function . Moreover, the stationarity conditions impose that we have $\nabla ℒ (x ⋆) = 0$ . Hence, by Theorem 2.5, $x ⋆$ is a global minimizer for $ℒ$ on $ℝ^{n_{x}}$ , i.e.

ℒ (x) \geq ℒ (x ⋆) \forall x \in ℝ^{n_{x}}

In particular, for each $x$ such that $g_{i} (x) \leq g_{i} (x ⋆) = 0$ , $i \in 𝒜 (x ⋆)$ , we have

f (x) - f (x ⋆) \geq - \sum_{i \in 𝒜 (x ⋆)} ν_{i}^{⋆} [g_{i} (x) - g_{i} (x ⋆)] \geq 0

Noting that ${x \in ℝ^{n_{x}} : g_{i} (x) \leq 0, i \in 𝒜 (x ⋆)}$ contains the feasible domain ${x \in ℝ^{n_{x}} : g_{i} (x) \leq 0, i = 1, \dots, n_{g}}$ , we therefore showed that $x ⋆$ is a global minimizer for the problem. □

Example 2.14. Consider the problem

\begin{matrix} \min_{x \in ℝ^{3}} f (x) : & = \frac{1}{2} (x_{1}^{2} + x_{2}^{2} + x_{3}^{2}) \\ s . t . & g_{1} (x) : = x_{1} + x_{2} + x_{3} + 3 \leq 0 \\ g_{2} (x) : = x_{1} \leq 0 \end{matrix}

The point $(x ⋆, ν ⋆)$ with $x_{1}^{⋆} = x_{2}^{⋆} = x_{3}^{⋆} = - 1$ , $ν_{1}^{⋆} = 1$ and $ν_{2}^{⋆} = 0$ , being a KKT point, and both the objective function and the feasible set being convex, by Theorem 2.11, $x ⋆$ is a global minimum.

Both second-order necessary and sufficient conditions for inequality constrained NLP problems will be presented later on in 2.8.