Notes | Mechatronic Systems and Laboratory

2.5 Unconstrained problems

Definition 2.14: Unconstrained Programs

An unconstrained program is a problem of the form to minimize (or maximize) $f (x)$ without any constraints on the variables $x$ :

\min {f (x) : x \in ℝ^{n_{x}}}

Remark 2.15

Note that, being the feasible domain of $x$ unbounded, theorem 2.1 does not apply. Thus, one does not know with certainty whether a minimum actually exists for that problem ^a ^a For unconstrained optimization problems, the existence of a minimum can actually be guaranteed if the objective function is such that $\lim_{∥ x ∥ \to + \infty} f (x) = + \infty$ (O-coercive function). . Moreover, even if the objective function is convex, one such minimum may not exist (think of $f : x \mapsto \exp x$ !). Hence, we shall proceed with the theoretically unattractive task of seeking minima and maxima of functions which need not have them!

Given a point $x \in ℝ^{n_{x}}$ , necessary conditions help determine whether or not a point is a local or a global minimum of a function $f$ . For this purpose, we are mostly interested in obtaining conditions that can be checked algebraically.

Definition 2.15: Descent Direction

Suppose that $f : ℝ^{n_{x}} \to ℝ$ is continuous at $\bar{x}$ . A vector $d \in ℝ^{n_{x}}$ is said to be a descent direction, or an improving direction, for $f$ at $\bar{x}$ if

\exists δ > 0 : f (\bar{x} + λ d) < f (\bar{x}) \forall λ \in (0, δ)

Moreover, the cone of descent directions at $\bar{x}$ , denoted by $ℱ (\bar{x})$ , is given by

ℱ (\bar{x}) : = {d : \exists δ > 0 such that f (\bar{x} + λ d) < f (\bar{x}) \forall λ \in (0, δ)}

The definition 2.15 provides a geometrical characterization for a descent direction. Yet, an algebraic characterization for a descent direction would be more useful from a practical point of view. In response to this, let us assume that $f$ is differentiable and define the following set at $\bar{x}$ :

ℱ_{0} (\bar{x}) : = {d : \nabla f {(\bar{x})}^{⊤} d < 0}

This is illustrated in figure ?? where the half-space $ℱ_{0} (\bar{x})$ and the gradient $\nabla f (\bar{x})$ are translated from the origin to $\bar{x}$ for convenience.

Lemma 2.1 proves that every element $d \in ℱ_{0} (\bar{x})$ is a descent direction at $\bar{x}$ .

Lemma 2.1: Algebraic Characterization of a Descent Direction

Suppose that $f : ℝ^{n_{x}} \to ℝ$ is differentiable at $\bar{x}$ . If there exists a vector $d$ such that $\nabla f {(\bar{x})}^{⊤} d < 0$ , then $d$ is a descent direction for $f$ at $\bar{x}$ . That is,

ℱ_{0} (\bar{x}) \subseteq ℱ (\bar{x})

Proof. $f$ being differentiable at $\bar{x}$ ,

f (\bar{x} + λ d) = f (\bar{x}) + λ \nabla f {(\bar{x})}^{⊤} d + o (λ)

where $\lim_{λ \to 0} \frac{o (λ)}{λ} = 0$ . Rearranging the terms and dividing by $λ \neq 0$ , we get

\frac{f (\bar{x} + λ d) - f (\bar{x})}{λ} = \nabla f {(\bar{x})}^{⊤} d + \frac{o (λ)}{λ}

Since $\nabla f {(\bar{x})}^{⊤} d < 0$ and $\lim_{λ \to 0} \frac{o (λ)}{λ} = 0$ , there exists a $δ > 0$ such that $\nabla f {(\bar{x})}^{⊤} d + \frac{o (λ)}{λ} < 0$ for all $λ \in (0, δ)$ . □

We are now ready to derive a number of necessary conditions for a point to be a local minimum of an unconstrained optimization problem.

Theorem 2.3: First-Order Necessary Condition for a Local Minimum

Suppose that $f : ℝ^{n_{x}} \to ℝ$ is differentiable at $x ⋆$ . If $x ⋆$ is a local minimum, then $\nabla f (x ⋆) = 0$ .

Proof. The proof proceeds by contraposition. Suppose that $\nabla f (x ⋆) \neq 0$ . Then, letting $d = - \nabla f (x ⋆)$ , we get $\nabla f (x ⋆) ⊤ d = - ∥ \nabla f (x ⋆) ∥ 2 < 0$ . By lemma 2.1,

\exists δ > 0 : f (x ⋆ + λ d) < f (x ⋆) \forall λ \in (0, δ)

hence contradicting the assumption that $x ⋆$ is a local minimum for $f$ . □

Remark 2.16: Obtaining Candidate Solution Points

The above condition is called a first-order necessary condition because it uses the first-order derivatives of $f$ . This condition indicates that the candidate solutions to an unconstrained optimization problem can be found by solving a system of $n_{x}$ algebraic (nonlinear) equations. Points $\bar{x}$ such that $\nabla f (\bar{x}) = 0$ are known as stationary points. Yet, a stationary point need not be a local minimum; it could very well be a local maximum or even a saddle point.

Example 2.8. Consider the problem

\min_{x \in ℝ} - x^{6} + x^{4} + x^{3} - x^{2}

The gradient vector of the objective function is given by

\nabla f (x) = - 6 x^{5} + 4 x^{3} + 3 x^{2} - 2 x

which has three distinct roots $x_{1}^{⋆}$ , $x_{2}^{⋆}$ and $x_{3}^{⋆}$ . Out of these values, $x_{2}^{⋆}$ gives the smallest cost value. Yet, we cannot declare $x_{2}^{⋆}$ to be the global minimum because we do not know whether a (global) minimum exists for this problem. Indeed, as shown in figure 2.4 , none of the stationary points is a global minimum because $f$ decreases to $- \infty$ as $| x | \to \infty$ .

pict — Figure 2.4:: Illustration of the objective function and its derivative in Example 2.13

More restrictive necessary conditions can also be derived in terms of the Hessian matrix $\nabla^{2} f$ whose elements are the second-order derivatives of $f$ . One such second-order condition is given below.

Theorem 2.4: Second-Order Necessary Conditions for a Local Minimum

Suppose that $f : ℝ^{n_{x}} \to ℝ$ is twice differentiable at $x ⋆$ . If $x ⋆$ is a local minimum, then $\nabla f (x ⋆) = 0$ and $\nabla^{2} f (x ⋆)$ is positive semidefinite.

Proof. Consider an arbitrary direction $d$ . Then, from the differentiability of $f$ at $x ⋆$ , we have

f (x ⋆ + λ d) = f (x ⋆) + λ \nabla f (x ⋆) ⊤ d + \frac{λ^{2}}{2} d ⊤ \nabla^{2} f (x ⋆) d + o (λ^{2})

where $\lim_{λ \to 0} \frac{o (λ^{2})}{λ^{2}} = 0$ . Since $x ⋆$ is a local minimum, from Theorem 2.3, $\nabla f (x ⋆) = 0$ . Rearranging the terms and dividing by $λ^{2}$ , we get

\frac{f (x ⋆ + λ d)}{-} = \frac{1}{2} d ⊤ \nabla^{2} f (x ⋆) d + \frac{o (λ^{2})}{λ^{2}}

Since $x ⋆$ is a local minimum, $f (x ⋆ + λ d) \geq f (x ⋆)$ for $λ$ sufficiently small. By taking the limit as $λ \to 0$ , it follows that $d ⊤ \nabla^{2} f (x ⋆) d \geq 0$ . Since $d$ is arbitrary, $\nabla^{2} f (x ⋆)$ is therefore positive semidefinite. □

Example 2.9. Consider the problem

\min_{x \in ℝ^{2}} \frac{1}{2} x ⊤ [\begin{matrix} 3 & - 2 \\ - 2 & - 1.5 \end{matrix}] x = \frac{1}{2} x ⊤ A x

The gradient vector of the objective function is given by

\nabla f (x) = A x

so that the only stationary point in $ℝ^{2}$ is $\bar{x} = (0, 0)$ . Now, consider the Hessian matrix of the objective function at $\bar{x}$ :

\nabla^{2} f (\bar{x}) = A = (\begin{matrix} 3 & - 2 \\ - 2 & - 1.5 \end{matrix}) \forall x \in ℝ^{2}

It is easily checked that $\nabla^{2} f (\bar{x})$ is indefinite, $eig (A) = {3.8, - 2.3}$ . Therefore, by Theorem 2.4 , the stationary point $\bar{x}$ is not a (local) minimum (nor is it a local maximum). Such stationary points are called saddle points (see Figure 2.5 ).

The conditions presented in theorems 2.3 and 2.4 are necessary conditions. That is, they must hold true at every local optimal solution. Yet, a point satisfying these conditions need not be a local minimum. Theorem 2.5 gives sufficient conditions for a stationary point to be a global minimum point, provided the objective function is convex on $ℝ^{n_{x}}$ .

Theorem 2.5: First-Order Sufficient Conditions for a Global Minimum for Convex Functions

Suppose that $f : ℝ^{n_{x}} \to ℝ$ is differentiable at $x ⋆$ and convex on $ℝ^{n_{x}}$ . If $\nabla f (x ⋆) = 0$ , then $x ⋆$ is a global minimum of $f$ on $ℝ^{n_{x}}$ .

Proof. $f$ being convex on $ℝ^{n_{x}}$ and differentiable at $x ⋆$ , by Theorem 2.6, we have

f (x) \geq f (x ⋆) + \nabla f (x ⋆) ⊤ [x - x ⋆] \forall x \in ℝ^{n_{x}}

But since $x ⋆$ is a stationary point,

f (x) \geq f (x ⋆) \forall x \in ℝ^{n_{x}}

□

Theorem 2.6: First-Order Condition of Convexity

Let $C$ be a convex set in $ℝ^{n_{x}}$ with a nonempty interior, and let $f : C \to ℝ$ be a function. Suppose $f$ is continuous on $C$ and differentiable on $int (C)$ . Then $f$ is convex on $int (C)$ if and only if

f (y) \geq f (x) + \nabla f {(x)}^{⊤} [y - x]

holds for any two points $x, y \in C$ .

Proof. ... □

Theorem 2.7: Second-Order Condition of Convexity

Let $C$ be a convex set in $ℝ^{n_{x}}$ with a nonempty interior, and let $f : C \to ℝ$ be a function. Suppose $f$ is continuous on $C$ and twice differentiable on $int (C)$ . Then $f$ is convex on $int (C)$ if and only if

\nabla^{2} f (x) ≽ 0 \forall x \in C

Proof. ... □

Remark 2.17

The first-order condition of strict convexity is:

f (y) > f (x) + \nabla f {(x)}^{⊤} [y - x] \forall x, y \in C

While the second-order condition is:

\nabla^{2} f (x) ≻ 0 \forall x \in C

The convexity condition required by theorem 2.5 is actually very restrictive, in the sense that many practical problems are nonconvex. In theorem 2.8, we give sufficient conditions for characterizing a local minimum point, provided the objective function is strictly convex in a neighborhood of that point.

Theorem 2.8: Second-Order Sufficient Conditions for a Strict Local Minimum

Suppose that $f : ℝ^{n_{x}} \to ℝ$ is twice differentiable at $x ⋆$ . If $\nabla f (x ⋆) = 0$ and $\nabla^{2} f (x ⋆)$ is positive definite, then $x ⋆$ is a local minimum of $f$ .

Proof. $f$ being twice differentiable at $x ⋆$ , we have

f (x ⋆ + d) = f (x ⋆) + \nabla f (x ⋆) ⊤ d + \frac{1}{2} d ⊤ \nabla^{2} f (x ⋆) d + o (∥ d ∥^{2})

for each $d \in ℝ^{n_{x}}$ , where $\lim_{∥ d ∥ \to 0} \frac{o (∥ d ∥^{2})}{∥ d ∥^{2}} = 0$ . Let $λ^{L}$ denote the smallest eigenvalue of $\nabla^{2} f (x ⋆)$ . Then, $\nabla^{2} f (x ⋆)$ being positive definite we have $λ^{L} > 0$ and $d ⊤ \nabla^{2} f (x ⋆) d \geq λ^{L} ∥ d ∥^{2}$ . Moreover, from $\nabla f (x ⋆) = 0$ we get

f (x ⋆ + d) - f (x ⋆) \geq [\frac{λ^{L}}{2} + \frac{o (∥ d ∥^{2})}{∥ d ∥^{2}}] ∥ d ∥^{2}

Since $\lim_{∥ d ∥ \to 0} \frac{o (∥ d ∥^{2})}{∥ d ∥^{2}} = 0$ ,

\exists η > 0 such that | \frac{o (∥ d ∥^{2})}{∥ d ∥^{2}} | < \frac{λ^{L}}{2} \forall d \in 𝔹_{η} (0)

and finally,

f (x ⋆ + d) - f (x ⋆) \geq \frac{λ^{L}}{2} ∥ d ∥^{2} > 0 \forall d \in 𝔹_{η} (0) ∖ {0}

i.e., $x ⋆$ is a strict local minimum of $f$ . □

Example 2.10. Consider the problem

\min_{x \in ℝ^{2}} \frac{1}{2} {[\begin{matrix} x_{1} - 1 \\ x_{2} - 2 \end{matrix}]}^{⊤} [\begin{matrix} 5 & - 2 \\ - 2 & 2 \end{matrix}] [\begin{matrix} x_{1} - 1 \\ x_{2} - 2 \end{matrix}] = \frac{1}{2} {[\begin{matrix} x_{1} - 1 \\ x_{2} - 2 \end{matrix}]}^{⊤} A [\begin{matrix} x_{1} - 1 \\ x_{2} - 2 \end{matrix}]

The gradient vector and Hessian matrix at $\bar{x} = (1, 2)$ are given by

\nabla f (\bar{x}) = A [\begin{matrix} x_{1} - 1 \\ x_{2} - 2 \end{matrix}] = 0

\nabla^{2} f (\bar{x}) = A ≻ 0

Note that $A ≻ 0$ since $eig (A) = {6, 1}$ and also that $\nabla^{2} f ≽ 0$ is the second-order characterization of strict convexity. Hence, by Theorem 2.8 , $\bar{x}$ is a local minimum of $f$ ( $\bar{x}$ is also a global minimum of $f$ on $ℝ^{2}$ since $f$ is convex). The objective function is pictured in Figure 2.6 below.

We close this subsection by re-emphasizing the fact that every local minimum of an unconstrained optimization problem $\min {f (x) : x \in ℝ^{n_{x}}}$ is a global minimum if $f$ is a convex function on $ℝ^{n_{x}}$ . Yet, convexity of $f$ is not a necessary condition for each local minimum to be a global minimum.