Notes | Mechatronic Systems and Laboratory

3.3 Optimality conditions for weak extrema

In this section, we will derive a set of necessary optimality conditions that a trajectory $x ⋆$ must satisfy in order to be a candidate solution for a weak local minimum of a cost functional $F [x]$ . Note that, if we choose $𝒳 = 𝒞^{1}$ that is we work with once continuously differentiable functions, every strong extremum ² ² What we mean by ”extremum” will be made clear later on is automatically a weak extremum, as a consequence the set of necessary conditions valid for weak extrema are necessary also for strong extrema. We start considering the basic problem of calculus of variations according to Definition 3.1 in which the admissible set is:

𝒟 : = {x \in 𝒞^{1} {([t_{1}, t_{2}])}^{n_{x}} : x (t_{1}) = x 1, x (t_{2}) = x 2},

that is we are looking for a minimizing trajectory in the set of once continuously differentiable vector-valued functions that have given values at the endpoints. Recall that this is a free problem of the calculus of variations.

Definition 3.4: Perturbed functions

Given a function $x 0 \in 𝒟$ we define a family of perturbed functions as:

x (t) = x 0 (t) + α ξ (t),

where $α \in ℝ$ is a small real parameter and $ξ (t) \in 𝒞^{1} {([t_{1}, t_{2}])}^{n_{x}}$ are functions such that $ξ (t_{1}) = ξ (t_{2}) = 0$ .

Remark 3.3

Note that by choosing $α$ sufficiently small we can make $x 0$ and $x = x 0 + α ξ$ arbitrarily close in the sense of the ${‖ \cdot ‖}_{1, \infty}$ . We have that:

‖ x - x 0 ‖ 1, \infty = \max_{t_{1} \leq t \leq t_{2}} ‖ α ξ (t) ‖ + \max_{t_{1} \leq t \leq t_{2}} ‖ α \dot{ξ} (t) ‖ = | α | {‖ ξ ‖}_{1, \infty} .

A pictorial example of perturbed function for the scalar case is given in Figure 3.4

We now give a fundamental definition that can be seen as the generalization of the directional derivative to infinite-dimensional function spaces.

Definition 3.5: First Variation of a Functional, Gâteaux Derivative

Let F be a functional defined on a function space $𝒳$ . Then, the first variation of $F$ at $x \in 𝒳$ in the direction $ξ \in 𝒳$ also called Gâteaux derivative with respect to $ξ$ at $x$ , is defined as:

δ F_{x} [ξ] = \lim_{α \to 0} \frac{F [x + α ξ] - F [x]}{α} = \frac{\partial}{\partial α} F [x + α ξ] | α = 0,

(provided it exists). If the limit exists for all $ξ \in 𝒳$ , then $F$ is said to be Gâteaux differentiable at $x$

Remark 3.4

Note that for fixed $x$ and $ξ$ the functional $F [x + α ξ]$ is actually a real-valued function of $α$ that is $g (α) \equiv F [x + α ξ]$ . Then the Gâteaux derivative is simply the first derivative of the function $g$ with respect to $α$ computed in $α = 0$ , i.e. $δ F_{x} [ξ] = \frac{d g}{d α} | 0$

Remark 3.5

Note that the Gâteaux derivative $δ F_{x} [ξ]$ need not exist in any direction $ξ \neq 0$ , or it may exist is some directions and not in others. Its existence presupposes that:

$F [x]$ is defined,
$F [x + α ξ]$ is defined for all sufficiently small $α$ .

Then:

δ F_{x} [δ x] = \frac{\partial}{\partial α} F [x + α δ x] | α = 0 = \frac{d g}{d α} | 0

is defined only if this ”ordinary” derivative with respect to the real variable $α$ exists at $α = 0$ . If the Gâteaux derivative exists, then it is unique.

Example 3.6 (Calculation of a Gâteaux Derivative). Consider the functional $F [x] = \int_{a}^{b} x {(t)}^{2} d t$ . In order to take the Gâteaux derivative we can define $g (α) : = F [x + α ξ]$ and then directly differentiate it. That is:

\begin{aligned} δ F_{x} [ξ] = \frac{d}{d α} g (α) | α = 0 = [\frac{d}{d α} \int_{a}^{b} {(x (t) + α ξ (t))}^{2} d t] | α = 0 = \\ = [\int_{a}^{b} \frac{d}{d α} {(x (t) + α ξ (t))}^{2} d t] | α = 0 = \\ = [\int_{a}^{b} 2 (x (t) + α ξ (t)) ξ (t) d t] | α = 0 = \int_{a}^{b} 2 x (t) ξ (t) d t, \end{aligned}

on the other hand, we can directly take the limit:

\begin{aligned} δ F_{x} [ξ] = \lim_{α \to 0} \frac{F [x + α ξ] - F [x]}{α} = \\ = \lim_{α \to 0} \frac{\int_{a}^{b} {(x (t) + α ξ (t))}^{2} d t - \int_{a}^{b} x {(t)}^{2} d t}{α} = \\ = \lim_{α \to 0} \frac{\int_{a}^{b} 2 α x (t) ξ (t) + α^{2} ξ {(t)}^{2} d t}{α} = \int_{a}^{b} 2 x (t) ξ (t) d t . \end{aligned}

Example 3.7 (Non-Existence of a Gâteaux Derivative). Consider the functional $F [x] : = \int_{0}^{1} | x (t) | d t$ . Clearly, $F [x]$ is defined on all $𝒞^{1} [0, 1]$ for each continuous function $x \in 𝒞^{1} [0, 1]$ results in a continuous integrand $| x (t) |$ whose integral is finite. For $x_{0} (t) : = 0$ and $ξ (t) : = t$ we have:

F [x_{0} + α ξ] = \int_{0}^{1} | α t | d t,

therefore:

\frac{F [x_{0} + α ξ] - F [x_{0}]}{α} = {\begin{aligned} \frac{1}{2} & if α > 0 \\ - \frac{1}{2} & if α < 0 \end{aligned}

and a Gâteaux derivative does not exist at $x_{0}$ in the direction $ξ$ .

Lemma 3.1: Descent Direction

Let $F$ be a functional defined on a normed linear space $(𝒳, ‖ \cdot ‖)$ (e.g a function space). Suppose that $F$ has a strictly negative variation $δ F_{\bar{x}} [ξ] < 0$ at a point $\bar{x} \in 𝒳$ in some direction $ξ \in 𝒳$ . Then, $\bar{x}$ cannot be a local minimum point for $F$ (in the sense of the norm $‖ \cdot ‖$ )

Proof. Since $δ F_{\bar{x}} [ξ] < 0$

\exists δ > 0 such that \frac{F [\bar{x} + α ξ] - F [\bar{x}]}{α} < 0 \forall α \in ℬ_{δ} (0),

hence,

F [\bar{x} + α ξ] < F [\bar{x}], \forall α \in (0, δ)

since $‖ \bar{x} + α ξ - \bar{x} ‖ = | α | ‖ ξ ‖ \to 0$ as $α \to 0^{+},$ the points $\bar{x} + α ξ$ are eventually in each neighborhood of $\bar{x}$ , irrespective of the norm $∥ \cdot ∥$ considered on $𝒳$ . Thus, local minimum behavior of $F$ , in the sense of Definition 3.3 is not possible in the direction $ξ$ at $\bar{x}$ . □

Definition 3.6: $𝒟$ -Admissible Directions

Let $F$ be a functional defined on a subset $𝒟$ of a linear space $𝒳$ and let $\bar{x} \in 𝒟$ . Then, a direction $ξ \in 𝒳, ξ \neq 0$ , is said to be $𝒟$ -admissible at $\bar{x}$ for $F$ if:

$δ F_{\bar{x}} [ξ]$ exists.
$\bar{x} + α ξ \in 𝒟$ for all sufficiently small $α$ that is:
$\exists δ > 0 such that \bar{x} + α ξ \in 𝒟, \forall α \in ℬ_{δ} (0)$

Remark 3.6

Note that if $ξ$ is a $𝒟$ -admissible direction then also the parametrized family of directions $a ξ$ for all $a \in ℝ$ is a $𝒟$ -admissible direction. This simple fact follows directly from the definition of $𝒟$ -admissible directions.

Remark 3.7: Properties of First Varations

It is instrumental to highlight some properties of the first variation of a functional:
Linearity. Let $F_{1}$ and $F_{2}$ be two functionals defined on $𝒟$ and $ξ$ a $𝒟$ -admissible direction. Consider a linear combination of the two functionals $\tilde{F} = a_{1} F_{1} + a_{2} F_{2}$ ^a ^a Indeed $\tilde{F}$ is still a functional. Then $δ {\tilde{F}}_{x} [ξ] = a_{1} F_{1, x} [ξ] + a_{2} F_{2, x} [ξ]$ . This linearity property is inherited by the linearity property of the ordinary derivative.
Homogeneity. Let $F$ be a functional defined on $𝒟$ and $a ξ a \in ℝ$ a family of $𝒟$ -admissible directions, then $δ F_{x} [a ξ] = a δ F_{x} [ξ] \forall a \in ℝ$ . In other words, the first variation is a homogeneous operator.

Note also that in general the first variation is not a linear operator in $ξ$ that is $δ F_{x} [a_{1} ξ 1 + a_{2} ξ 2] \neq a_{1} δ F_{x} [ξ 1] + a_{2} δ F_{x} [ξ 2]$ for $a_{1}, a_{2} \in ℝ$ and $ξ 1, ξ 2$ $𝒟$ -admissible directions although it may be true in some special cases. We will require linear Gâteaux derivatives to prove some results of inequality constrained problems.

Theorem 3.1: Geometric Necessary Conditions for a Local Minimum

Let $F$ be a functional defined on a subset $𝒟$ of a normed linear space $(𝒳, ‖ \cdot ‖)$ . Suppose that $x ⋆ \in 𝒟$ is a local minimum point for $F$ on $𝒟$ . Then:

δ F_{x ⋆} [ξ] = 0 for each 𝒟 -admissible direction ξ at x ⋆ .

Proof. By contradiction, suppose that there exists a $𝒟$ -admissible direction $ξ$ such that $δ F_{x ⋆} [ξ] < 0$ . Then, by Lemma 3.1, $x ⋆$ cannot be a local minimum for $F$ . Likewise, there cannot be a $𝒟$ -admissible direction $ξ$ such that $δ F_{x ⋆} [ξ] > 0$ . Indeed, $- ξ$ being $𝒟$ -admissible and $δ F_{x ⋆} [- ξ] = - δ F_{x ⋆} [ξ] < 0$ by Lemma 3.1, $x ⋆$ cannot be a local minimum. Overall, we must have that $δ F_{x ⋆} [ξ] = 0$ for each $𝒟$ -admissible direction $ξ$ at $x ⋆$ . □

Remark 3.8

Note that Theorem 3.1 can be interpreted as the usual necessary condition for optimality for unconstrained NLP problems. Let’s define $g (α) : = F [x ⋆ + α ξ]$ . Note that the scalar function $g (α)$ is the value of the functional when perturbed around a minimizing trajectory $x ⋆$ that is we are assuming that $F [x ⋆]$ is a local minimum. Equivalently, we are stating that $g (0) \equiv F [x ⋆]$ is a local minimum for $g (α)$ . Therefore a necessary condition for $g (0)$ to be a local minimum is $\frac{d g}{d α} | α = 0 = 0$ . But for the definition of $g$ we also have:

\frac{d g}{d α} | α = 0 = δ F_{x ⋆} [ξ] = 0,

where the last statement must hold for all admissible directions $ξ \in 𝒟$ .

We now state some important results needed for the derivations of the necessary conditions of optimality.

Lemma 3.2

Let $h \in 𝒞^{1} [t_{1}, t_{2}]$ and $ξ \in 𝒟 = {ξ \in 𝒞^{1} [t_{1}, t_{2}] such that ξ (t_{1}) = ξ (t_{2}) = 0}$ Suppose that:

\int_{t_{1}}^{t_{2}} h (t) ξ (t) d t = 0 \forall ξ \in 𝒟

then $h (t) \equiv 0 \forall t \in [t_{1}, t_{2}]$

Proof. By contradiction suppose there is a $\tilde{t} \in [t_{1}, t_{2}]$ such that $h (\tilde{t}) \neq 0$ , say $h (\tilde{t}) > 0$ ^a ^a the same reasoning for $h (\tilde{t}) < 0$ is analogous and is left to the reader as an exercise. By continuity of $h$ there exist a ball around $\tilde{t}$ such that $h$ is always positive, formally $\exists δ > 0 such that h (t) > 0 \forall t \in ℬ_{δ} (\tilde{t})$ . Since $ξ$ is arbitrary we can choose a function that is $0$ everywhere but in $ℬ_{δ} (\tilde{t})$ where we choose it to be a positive continuous and differentiable function $\forall t \in ℬ_{δ} (\tilde{t})$ then, for such a $ξ$ , we have:

\int_{t_{1}}^{t_{2}} h (t) ξ (t) d t = \int_{ℬ_{δ} (\tilde{t})} h (t) ξ (t) d t > 0

since both $h (t)$ and $ξ (t)$ are positive for $t \in ℬ_{δ} (\tilde{t})$ and thus we reach a contradiction. Hence, $h (t) = 0 \forall t \in [t_{1}, t_{2}]$ . □

Remark 3.9

Note that with simple modifications Lemma 3.2 to the case:

\int_{t_{1}}^{t_{2}} h {(t)}^{⊤} ξ (t) d t = 0 \forall ξ \in 𝒟

Where $h \in 𝒞 {[t_{1}, t_{2}]}^{n_{x}}$ and $ξ \in 𝒟 = {ξ \in [t_{1}, t_{2}] such that ξ (t_{1}) = ξ (t_{2}) = 0}$ . We recover the case proved by the Lemma 3.2 by rewriting the previous equation as:

\int_{t_{1}}^{t_{2}} h {(t)}^{⊤} ξ (t) d t = \int_{t_{1}}^{t_{2}} \sum_{i = 1}^{n_{x}} h_{i} (t) ξ_{i} (t) d t = 0 \forall ξ \in 𝒟

Then it is sufficient to consider functions $ξ i$ sucht that $ξ_{j} = 0 \forall j \neq i$ and apply the results of Lemma 3.2 for each equation $\int_{t_{1}}^{t_{2}} h_{i} (t) ξ_{i} (t) d t = 0$ . In this way we obtain that the vector-valued function $h = 0$ for each $t \in [t_{1}, t_{2}]$ .

We state another standard instrumental theorem without proof.

Theorem 3.2: Differentiation Under the Integral Sign

Let $f : ℝ \times ℝ \to ℝ$ such that $f : = f (t, α),$ be a continuous function with continuous partial derivative $f_{α} = \frac{\partial f}{\partial α}$ on $[t_{1}, t_{2}] \times [α_{1}, α_{2}] .$ Then

g (α) : = \int_{t_{1}}^{t_{2}} f (t, α) d t

is in $𝒞^{1} [α_{1}, α_{2}],$ with the derivative

\frac{d}{d α} g (α) = \frac{d}{d α} \int_{t_{1}}^{t_{2}} f (t, α) d t = \int_{t_{1}}^{t_{2}} \frac{\partial f}{\partial α} d t .

Remark 3.9 and Theorem 3.2 are used to obtain a set of necessary conditions for optimality known as Euler equations.

Theorem 3.3: Euler’s Necessary Conditions for Optimality

Consider the problem to minimize the functional:

\begin{matrix} \min_{x (t)} & F [x] = \int_{t_{1}}^{t_{2}} f (t, x (t), \dot{x} (t)) d t \\ s . t . & x \in 𝒟 \end{matrix}

where the functional space is defined as $𝒟 = {x \in 𝒞^{1} [t_{1}, t_{2}] such that x (t_{1}) = x 1 x (t_{2}) = x 2}$ and $f : ℝ \times ℝ^{n_{x}} \times ℝ^{n_{x}} \to ℝ$ a continuously differentiable function. Suppose that $x ⋆ \in 𝒟$ gives a (local) minimum for $F$ on $𝒟$ . Then $x ⋆$ satisfies the Euler equations:

\frac{d}{d t} \frac{\partial f (t, x}{⋆} (t), {\dot{x}}^{⋆} (t)) \partial \dot{x} ⊤ = \frac{\partial f (t, x}{⋆} (t), {\dot{x}}^{⋆} (t)) \partial x ⊤

Proof. We start by computing the first variation of the functional $F$ at the minimizing trajectory $x ⋆$ in a $𝒟$ -admissible direction $ξ$ :

\begin{aligned} δ F_{x ⋆} [ξ] = \frac{\partial}{\partial α} F [x ⋆ + α ξ] | α = 0 = \\ = [\int_{t_{1}}^{t_{2}} {\frac{\partial f (t, x}{⋆} \dot{ξ} + \frac{\partial f (t, x}{⋆} ξ} d t] | α = 0 = \\ = \int_{t_{1}}^{t_{2}} {\frac{\partial f (t, x}{⋆} \dot{ξ} + \frac{\partial f (t, x}{⋆} ξ} d t . \end{aligned}

By Theorem 3.1 we must have:

δ F_{x ⋆} [ξ] = \int_{t_{1}}^{t_{2}} {\frac{\partial f (t, x}{⋆} \dot{ξ} + \frac{\partial f (t, x}{⋆} ξ} d t = 0 \forall 𝒟 -admissible ξ .

By Definition 3.6, $𝒟$ -admissible directions are $𝒞^{1} [t_{1}, t_{2}]$ functions such that $ξ (t_{1}) = ξ (t_{2}) = 0$ . Note that in this way we have $x ⋆ + α ξ \in 𝒟$ . We modify the expression of the first variation in order to apply Lemma 3.2. Using integration by parts we have:

\int_{t_{1}}^{t_{2}} \frac{\partial f (t, x}{⋆} \dot{ξ} d t = [\frac{\partial f (t, x}{⋆} ξ] |_{t_{1}}^{t_{2}} - \int_{t_{1}}^{t_{2}} \frac{d}{d t} \frac{\partial f (t, x}{⋆} ξ d t

but since $ξ$ is a $𝒟$ -admissible direction $ξ (t_{1}) = ξ (t_{2}) = 0$ we have:

\int_{t_{1}}^{t_{2}} \frac{\partial f (t, x}{⋆} \dot{ξ} d t = - \int_{t_{1}}^{t_{2}} \frac{d}{d t} \frac{\partial f (t, x}{⋆} ξ d t .

The necessary condition for optimality then becomes:

δ F_{x ⋆} [ξ] = \int_{t_{1}}^{t_{2}} [\frac{\partial f (t, x}{⋆} - \frac{d}{d t} \frac{\partial f (t, x}{⋆}] ξ = 0 \forall 𝒟 -admissible ξ .

The last equation satsifies the conditions of Lemma 3.2, therefore we must have:

\frac{\partial f (t, x}{⋆}, {\dot{x}}^{⋆}) \partial x ⊤ - \frac{d}{d t} \frac{\partial f (t, x}{⋆}, {\dot{x}}^{⋆}) \partial \dot{x} ⊤ = 0 \forall t \in [t_{1}, t_{2}]

□

Definition 3.7: Stationary Function

Each $𝒞^{1}$ function $\bar{x}$ that satisfies the Euler equations on some interval is called a stationary function for the Lagrangian $f$ .

Remark 3.10

Note that the Euler equations are necessary conditions for optimality therefore a stationary point $\bar{x}$ can be a minimum, a maximum or none of them. In the literature stationary points are alse called extrema or extremal trajectories.

Example 3.8 (Simplest calculus of variations problem). Given two points in the plane $A (x_{1}, y_{1})$ and $B (x_{2}, y_{2})$ , find the curve of minimum length that joins them. A schematic picture is drawn in Figure 3.5. Obviously, the solution to this problem is a line joining $A$ and $B$ but we will work out the solution using the Euler equation. The functional that we want to minimize is the length of the curve, that is we want to solve the problem:

\begin{matrix} \min_{y (x)} & F [y] = \int_{x_{1}}^{x_{2}} f (x, y (x), y^{'} (x)) d x = \int_{x_{1}}^{x_{2}} d s = \int_{x_{1}}^{x_{2}} \sqrt{1 + y^{'} {(x)}^{2}} d x \\ s . t . & y \in 𝒟 = {y \in 𝒞^{1} [x_{1}, x_{2}] such that y (x_{1}) = y_{1} y (x_{2}) = y_{2}} \end{matrix}

note that with respect to the previous notation $x$ is the independent variable and we are seeking for a function $y (x)$ . We have also indicated $y^{'} (x) = \frac{d y}{d x}$ . The lagrangian for this problem is $f = \sqrt{1 + y^{'} {(x)}^{2}}$ . Note that it is independent of $y (x)$ . The Euler equation for this problem reads:

\frac{d}{d x} \frac{\partial f}{\partial y^{'}} = \frac{\partial f}{\partial y} = 0

As a consequence:

\frac{d}{d x} \frac{\partial f}{\partial y^{'}} = \frac{d}{d x} \frac{\partial}{\partial y^{'}} \sqrt{1 + y^{'} {(x)}^{2}} = \frac{y^{″} (x)}{\sqrt{{(1 + {y^{'}}^{2})}^{3}}} = 0

Since the denominator is always different from zero we have $y^{″} (x) = 0$ , therefore stationary functions have zero second-derivative. We can then integrate this condition to derive $y$ :

y^{⋆} (x) = C_{1} x + C_{2}

Finally imposing that $y (x) \in 𝒟$ , that is imposing the boundary conditions $y (x_{1}) = y_{1}$ and $y (x_{2}) = y_{2}$ we obtain:

y^{⋆} (x) = y_{1} + \frac{y_{2} - y_{1}}{x_{2} - x_{1}} (x - x_{1})

Remark 3.11

Depending on the structure of the lagrangian function $f$ , Euler equations can be simplified.

f does not explicitly depend on the function $x$ .
Suppose $f = f (t, \dot{x})$ , then the Euler equations becomes $\frac{d}{d t} {\frac{\partial f}{\partial \dot{x}}}^{⊤} = 0$ .

f does not explicitly depend on the function $\dot{x}$ .
Suppose $f = f (t, x)$ then the Euler equations becomes ${\frac{\partial f}{\partial x}}^{⊤} = 0$ . This is a degenerate case.

f does not explicitly depend on the independent variable $t$ .
Suppose $f = f (x, \dot{x})$ , we define the Hamiltonian function $H (x, \dot{x}) = f - \frac{\partial f}{\partial \dot{x}} \dot{x}$ . If $\bar{x}$ is a stationary point then the Hamiltonian is constant. The total time-derivative of the Hamiltonian is:

\begin{aligned} \frac{d}{d t} H = \frac{d}{d t} (f - \frac{\partial f}{\partial \dot{x}} \dot{x}) = \\ = \frac{\partial f}{\partial x} \dot{x} + \frac{\partial f}{\partial \dot{x}} \ddot{x} - (\frac{d}{d t} \frac{\partial f}{\partial \dot{x}}) \dot{x} - \frac{\partial f}{\partial \dot{x}} \ddot{x} = \\ = (\frac{\partial f}{\partial x} - \frac{d}{d t} \frac{\partial f}{\partial \dot{x}}) \dot{x} = 0 \end{aligned}

Therefore an equivalent necessary condition is $H = f - \frac{\partial f}{\partial \dot{x}} \dot{x} = C$ . Where C is a constant. This is also known as Beltrami Identity.

Example 3.9 (A more challenging problem: The Brachistochrone). Recall the Brachistochrone Problem 3.1:

\begin{aligned} \min_{x} & F [x] = \int_{ξ_{A}}^{ξ_{B}} \sqrt{\frac{1 + ẋ {(ξ)}^{2}}{v_{A}^{2} - 2 g (x (ξ) - x_{A})}} d ξ \\ s . t . & x \in 𝒟 = {x \in 𝒞^{1} [ξ_{1}, ξ_{2}] such that x (ξ_{1}) = x_{A} x (ξ_{2}) = x_{B}} \end{aligned}

³.

For the sake of simplicity, we consider a slightly simplified problem. Assume that the initial velocity $v_{A} = 0$ and the initial point is the origin, that is $A = (0, 0)$ . The lagrangian for this problem is $f = \sqrt{\frac{1 + ẋ {(ξ)}^{2}}{- 2 g x (ξ)}}$ . Note that the lagrangian is independent of $ξ$ , hence the Hamiltonian is stationary. Therefore, we have:

H = f - \frac{\partial f}{\partial ẋ} ẋ = \sqrt{\frac{1 + ẋ {(ξ)}^{2}}{- 2 g x (ξ)}} - \frac{ẋ^{2}}{\sqrt{- x (1 + ẋ^{2})}} = \frac{1}{\sqrt{2 g}} \frac{1 + ẋ^{2} - ẋ^{2}}{\sqrt{- x (1 + ẋ^{2}}} = C_{0}

Which is equivalent to:

x (1 + ẋ^{2}) = 2 C_{1}

for a new constant $C_{1} = - \frac{1}{g C_{0}^{2}}$ . Thus we have:

ẋ = \sqrt{\frac{2 C_{1} - x}{x}}

hat is equivalent to:

d ξ = \sqrt{\frac{x}{2 C_{1} - x}} d x

In order to solve the previous integral we make the substitution $x = C_{1} (1 + c o s (2 ψ))$ , therefore $d x = - 2 C_{1} s i n (2 ψ) d ψ$ The integral becomes:

\begin{aligned} \int_{x_{A}}^{x} \sqrt{\frac{x}{2 C_{1} - x}} d x = \int_{ψ_{A}}^{ψ} \sqrt{\frac{C_{1} * (1 + c o s (2 ψ)}{2 C_{1} - C_{1} (1 + c o s (2 ψ))}} - 2 C_{1} s i n (2 ψ) d ψ = \\ = - 2 C_{1} \int_{ψ_{A}}^{ψ} \sqrt{\frac{(1 + c o s (2 ψ) s i n^{2} (2 ψ)}{(1 - c o s (2 ψ))}} d ψ = \\ = - 2 C_{1} \int_{ψ_{A}}^{ψ} \sqrt{\frac{(1 + c o s (2 ψ) (1 - c o s^{2} (2 ψ))}{(1 - c o s (2 ψ))}} d ψ = \\ = - 2 C_{1} \int_{ψ_{A}}^{ψ} (1 + c o s (2 ψ) d ψ = a - C_{1} (2 ψ + s i n (2 ψ)) \end{aligned}

where $a$ is a constant to be determined. Overall we have obtained a curve in the parameter $ψ$ that is:

\begin{aligned} x = C_{1} (1 + c o s (2 ψ)) \\ ξ = a - C_{1} (2 ψ + s i n (2 ψ) . \end{aligned}

This is the parametric form of a family of cycloids depending on the constants $C_{1}$ (that in turn depends on $C_{0}$ ) and $a$ . The first boundary condition allows to easily find $a$ in terms of $C_{1}$ . That is:

\begin{aligned} x_{A} = 0 = C_{1} (1 + c o s (2 ψ_{A})) \to ψ_{A} = \frac{π}{2} \\ ξ_{A} = 0 = a - C_{1} [2 ψ_{A} + s i n (2 ψ_{A}] \to a = C_{1} π \end{aligned}

Thus, we obtain a smaller family of cycloids passing through the origin as shown in Figure 3.6whose equations are:

\begin{aligned} x = C_{1} (1 + c o s (2 ψ)) \\ ξ = C_{1} (π - 2 ψ - s i n (2 ψ)) \end{aligned}

note that these are alle minimum time curves. The second boundary condition is somewhat more involved and is solved numerically, the nonlinear system is:

\begin{aligned} x_{B} = C_{1} (1 + c o s (2 ψ_{B})) \\ ξ_{B} = C_{1} (π - 2 ψ - s i n (2 ψ_{B})) \end{aligned}

The solution joining points $A = (0, 0)$ and $B = (6, - 5)$ is shown in red in Figure 3.6. The constant $C_{1} = - 2.62$ .

Remark 3.12: Hamilton’s Interpretation of Lagrangian Mechanics

According to the principle of least action, a system in motion will always follow the trajectory that minimize the action functional:

A [q] = \int_{t_{1}}^{t_{2}} L (q, \dot{q}) d t = \int_{t_{1}}^{t_{2}} (T (q, \dot{q}) - V (q)) d t

where $q \in 𝒞^{1} {[t_{1}, t_{2}]}^{n}$ are the generalized lagrangian coordinates of an $n$ degrees of freedom system. $T (q, \dot{q})$ is the kinetic energy while $V (q)$ is the potential energy. The Euler necessary condition reads:

\begin{aligned} \frac{\partial}{\partial q} (T - V) = \frac{d}{d t} \frac{\partial}{\partial \dot{x}} (T - V) = 0 ⊤ \\ \frac{d}{d t} {\frac{\partial T}{\partial \dot{q}}}^{⊤} - {\frac{\partial T}{\partial q}}^{⊤} + {\frac{\partial V}{\partial q}}^{⊤} = 0 \end{aligned}

which is the usual Lagrange equation of analytical mechanics. In view of this mechanical interpretation recall the definition of Hamiltonian $H = f - \frac{\partial f}{\partial \dot{x}} \dot{x}$ . In general the kinetic energy have the form $T (q, \dot{q}) = \frac{1}{2} {\dot{q}}^{⊤} M (q) \dot{q}$ . In this case the Hamiltonian reads:

H = T - V - \frac{\partial (T - V)}{\partial \dot{q}} \dot{q} = - (T + V) = - E

where $E$ is the total energy of the system. According to Remark 3.11 if the lagrangian is independent of time $t$ the Hamiltonian is constant along an extremal trajectory. In other words the total energy of the system is conserved. Finally, recall that the momentum $m$ of a mechanical system can be defined as $m = {\frac{\partial T}{\partial \dot{q}}}^{⊤}$ , hence the special case when the lagrangian is independent of $x$ in Remark 3.11 can be viewed as a momentum conservation that is $\frac{d}{d t} {\frac{\partial T}{\partial \dot{q}}}^{⊤} = \frac{d}{d t} m = 0$ .

Analogously to NLP problems, also for problems of the calculus of variations we can derive a second-order necessary condition. Recall that in the finite-dimensional NLP case the second-order necessary condition was the semi-positive definiteness of the Hessian matrix that is equivalent to the objective function being locally convex in the neighbourhood of a stationary point. Similar conditions hold in the infinite-dimensional case. First, we need the following definition:

Definition 3.8: Second Variation of a Functional

Let F be a functional defined on a function space $𝒳$ . Then, the second variation of $F$ at $x \in 𝒳$ in the direction $ξ \in 𝒳$ is defined as:

δ^{2} F_{x} [ξ] = \frac{\partial^{2}}{\partial α^{2}} F [x + α ξ] | α = 0

(provided it exists).

Theorem 3.4: Second-Order Necessary Conditions (Legendre)

Consider the problem to minimize the functional:

\begin{matrix} \min_{x (t)} & F [x] = \int_{t_{1}}^{t_{2}} f (t, x (t), \dot{x} (t)) d t \\ s . t . & x \in 𝒟 \end{matrix}

where the functional space is defined as $𝒟 = {x \in 𝒞^{1} {[t_{1}, t_{2}]}^{n_{x}}$ such that $x (t_{1}) = x 1 x (t_{2}) = x 2}$ and $f : ℝ \times ℝ^{n_{x}} \times ℝ^{n_{x}} \to ℝ$ a continuously differentiable function. Suppose that $x ⋆ \in 𝒟$ gives a (local) minimum for $F$ on $𝒟$ . Then $x ⋆$ satisfies the Euler equation along with the so-called Legendre condition:

\frac{\partial^{2} f}{\partial {\dot{x}}^{2}} ≽ 0 \forall t \in [t_{1}, t_{2}] .

Proof. Based on the differentiability properties of Theorem 3.2 we can compute the second variation as:

\begin{aligned} δ^{2} F_{x} [ξ] = \frac{\partial^{2}}{\partial α^{2}} F [x + α ξ] | α = 0 = \int_{t_{1}}^{t_{2}} \frac{\partial^{2}}{\partial α^{2}} f [x ⋆ + α ξ] d t \\ = \int_{t_{1}}^{t_{2}} (ξ ⊤ f_{x x} [x ⋆ + α ξ] ξ + 2 ξ ⊤ f_{x \dot{x}} [x ⋆ + α ξ] \dot{ξ} + {\dot{ξ}}^{⊤} f_{\dot{x} \dot{x}} [x ⋆ + α ξ] \dot{ξ}) d t \end{aligned}

where we have used the compressed notation $f [x + α ξ] : = f (t, x (t) + α ξ (t), \dot{x} (t) + α \dot{ξ} (t))$ . Substituing $α = 0$ gives:

\begin{aligned} δ^{2} F_{x} [ξ] = & \int_{t_{1}}^{t_{2}} (ξ ⊤ f_{x x} [x ⋆ + α ξ] ξ + \\ + 2 ξ ⊤ f_{x \dot{x}} [x ⋆ + α ξ] \dot{ξ} + {\dot{ξ}}^{⊤} f_{x x} [x ⋆ + α ξ] \dot{ξ}) d t . \end{aligned}

Define the real-valued function of $α$ , $g (α) : = F [x ⋆ + α ξ]$ . We have that $α = 0$ satisfies the necessary conditions for a local minimum of an unconstrained optimization problem, since $g (0) = F [x ⋆]$ is a local minimum for the functional $F$ . Therefore $\frac{d g}{d α} | 0 = 0$ and $\frac{d^{2} g}{d^{2} α} | 0 \geq 0$ . Note that $\frac{d^{2} g}{d α^{2}} | 0 = \frac{\partial^{2}}{\partial α^{2}} F [x ⋆ + α ξ] | α = 0 = δ^{2} F_{x ⋆} [ξ]$ . Hence we have:

\begin{aligned} \frac{d^{2} g}{d α^{2}} | 0 = δ^{2} F_{x} [ξ] = & \int_{t_{1}}^{t_{2}} (ξ ⊤ f_{x x} [x ⋆ + α ξ] ξ + \\ + 2 ξ ⊤ f_{\dot{x} x} [x ⋆ + α ξ] \dot{ξ} + {\dot{ξ}}^{⊤} f_{x x} [x ⋆ + α ξ] \dot{ξ}) d t \geq 0 . \end{aligned}

The last equation must hold true for each perturbation $ξ \in 𝒞^{1} [t_{1}, t_{2}]$ such that $ξ (t_{1}) = ξ (t_{2}) = 0$ . We use Einstein’s notation to imply the double summation, that is:

ξ ⊤ A ξ = \sum_{i = 1}^{n_{x}} \sum_{j = 1}^{n_{x}} a_{i, j} ξ_{i} ξ_{j} \equiv a_{i, j} ξ_{i} ξ_{j}

to simplify the notation let’s define $A : = f_{\dot{x} x} [x ⋆]$ . Note that due to Schwarz’s theorem $A$ is symmetric. Using integration by parts we have:

\int_{t_{1}}^{t_{2}} a_{i, j} ξ_{i} {\dot{ξ}}_{j} d t = (a_{i, j} ξ_{i} ξ_{j}) |_{t_{1}}^{t_{2}} - \int_{t_{1}}^{t_{2}} (ȧ_{i, j} ξ_{i} ξ_{j} + a_{i, j} {\dot{ξ}}_{i} ξ_{j}) d t

then, using the symmetry of A and the boundary conditions on $ξ$ , we have:

\int_{t_{1}}^{t_{2}} 2 a_{i, j} ξ_{i} {\dot{ξ}}_{j} d t = - \int_{t_{1}}^{t_{2}} ȧ_{i, j} ξ_{i} ξ_{j} d t

hence we have:

2 \int_{t_{0}}^{t_{1}} ξ ⊤ f_{\dot{x} x} [x ⋆] \dot{ξ} = - \int_{t_{1}}^{t_{2}} ξ ⊤ (\frac{d}{d t} f_{\dot{x} x} [x ⋆]) ξ d t .

The necessary condition on the second variation becomes:

δ^{2} F_{x} [ξ] = \int_{t_{1}}^{t_{2}} {ξ ⊤ (f_{x x} [x ⋆] - \frac{d}{d t} f_{\dot{x} x} [x ⋆]) ξ + {\dot{ξ}}^{⊤} f_{\dot{x} \dot{x}} [x ⋆] \dot{ξ}} d t \geq 0

by Lemma 3.3 a necessary condition on $δ^{2} F_{x} [ξ]$ to be nonnegative is:

f_{\dot{x} \dot{x}} [x ⋆] = \frac{\partial^{2} f}{\partial {\dot{x}}^{2}} ≽ 0 \forall t \in [t_{1}, t_{2}]

□

Lemma 3.3

Let $P (t)$ and $Q (t)$ be given continuous $(n_{x} \times n_{x})$ symmetric matrix functions on $[t_{1}, t_{2}],$ and let the quadratic functional:

\int_{t_{1}}^{t_{2}} ξ {(t)}^{⊤} Q (t) ξ (t) + \dot{ξ} {(t)}^{⊤} P (t) \dot{ξ} (t) d t

be defined for all $ξ \in 𝒞^{1} {[t_{1}, t_{2}]}^{n_{x}}$ such that $ξ (t_{1}) = ξ (t_{2}) = 0$ . Then, a necessary condition for the quadratic functional to be nonnegative for all such $ξ$ is that $P (t) ≽ 0$ for each $t \in [t_{1}, t_{2}]$ .

It would be tempting to extend the analogy with NLP problems and looking for a sufficient condition. Recall that a sufficient condition for a local minimum of an unconstrained NLP was the stationarity and the strict convexity of the Hessian at the candidate solution, see Theorem 2.8. Unfortunately, the requirements that $\bar{x}$ shall be a stationary function for $F$ and the lagrangian $f (t, y, z)$ ⁴ strictly convex with respect to the third argument $z$ are not sufficient for $\bar{x}$ to be a (weak) local minimum of a free problem of the the calculus of variations. Additional conditions must hold, such as the absence of points conjugate to the point $t_{1}$ in $[t_{1}, t_{2}]$ , such condition is called Jacobi’s sufficient condition. We remind the reader to the vast literature about this topic. Instead we give a sufficient condition for a global minimum that relies on the lagrangian $f (t, y, z)$ being jointly convex in the second and third argument $y$ , $z$ respectively. Unfortunately this condition is seldom satisfied in practice. Recall that for a continuously differentiable function $j (x, y) : C_{x} \subset ℝ^{n_{x}} \times C_{y} \subset ℝ^{n_{y}} \to ℝ$ , joint convexity means:

\begin{aligned} j (x, y) \geq j (x 0, y 0) + \\ + \nabla_{x} j {(x}^{0, y 0) ⊤} [x - x 0] + \nabla_{y} j {(x}^{0, y 0) ⊤} [y - y 0] \forall x \in C_{x} and y \in C_{y} \end{aligned}

Theorem 3.5: Sufficient Conditions for a Weak Global Minimum

Consider the problem to minimize the functional:

\begin{matrix} \min_{x (t)} & F [x] = \int_{t_{1}}^{t_{2}} f (t, x (t), \dot{x} (t)) d t \\ s . t . & x \in 𝒟 \end{matrix}

where the functional space is defined as $𝒟 = {x \in 𝒞^{1} {[t_{1}, t_{2}]}^{n_{x}}$ such that $x (t_{1}) = x 1 x (t_{2}) = x 2}$ and $f : ℝ \times ℝ^{n_{x}} \times ℝ^{n_{x}} \to ℝ$ a continuously differentiable function. Suppose that the Lagrangian $f (t, y, z)$ is [strictly] jointly convex in $(y, z)$ . If $x ⋆$ is a stationary function for the Lagrangian $f$ ^a ^a It satifies the Euler necessary condtions then $x ⋆$ is also a [strict] global minimizer for $F$ on $𝒟$ .

Proof.

\begin{aligned} F [x] - F [x ⋆] = \int_{t_{1}}^{t_{2}} f (t, x, \dot{x}) - f (t, x ⋆, {\dot{x}}^{⋆}) d t \geq \\ \geq \int_{t_{1}}^{t_{2}} \frac{\partial f (t, x}{⋆} [x - x ⋆] + \frac{f (t, x}{⋆} [\dot{x} - {\dot{x}}^{⋆}] d t = \\ \int_{t_{1}}^{t_{2}} (\frac{\partial f (t, x}{⋆} - \frac{d}{d t} \frac{f (t, x}{⋆}) [x - x ⋆] d t + \\ + (\frac{f (t, x}{⋆} [x - x ⋆]) |_{t_{1}}^{t_{2}} = 0 \end{aligned}

where we have used joint convexity of $f$ in the second and third argument. Subsequenty, we make use of integration by parts and the fact that $x$ and $x ⋆$ are equal at the boundary, that is $x (t_{1}) = x ⋆ (t_{1})$ and $x (t_{2}) = x ⋆ (t_{2})$ together with the hypothesis that $x ⋆$ is a stationary point. Overall we have:

F [x] - F [x ⋆] \geq 0 \forall x \in 𝒟

which is the definition of global minimum of a functional. □

Example 3.10. Consider the problem P to minimize the functional

F [x] : = \int_{t_{1}}^{t_{2}} ẋ^{2} d t

on $𝒟 : = {x \in 𝒞^{1} [0, 1] : x (0) = 0, x (1) = x_{1}}$ . The lagrangian $f (t, y, z) = z^{2}$ is convex in $z$ and does not depend on $y$ , then thanks to Theorem 3.5every stationary point is a global minimum. The Euler necessary condition for this case reduces to:

\frac{d}{d t} \frac{\partial f}{\partial ẋ} = ẍ = 0

therefore by double integration and imposing the boundary condition we obtain:

x^{⋆} (t) = t x_{1}

that is a straight line joining the origin and the point $P = (1, x_{1})$ . The extremal function $x^{⋆}$ is the unique global minimum thanks to Theorem 3.5since it is the unique stationary point of the functional to be minimized.

3.3.1 Problems with free end points

We now consider slightly different and more involved problems. Consider the problem in the following definition:

Definition 3.9: Calculus of Variation Free End-point problem

Consider the Calculus of Variations problem with free end-point and end-time:

\begin{matrix} \min_{x (t)} & F [x] = \int_{t_{1}}^{t_{2}} f (t, x (t), \dot{x} (t)) d t + ϕ (t_{2}, x (t_{2})) \\ s . t . & x \in 𝒟 = {(x, t_{2}) \in 𝒞_{1} {[t_{1}, T]}^{n_{x}} \times ℝ such that x (t_{1}) = x 1} \end{matrix}

Remark 3.13

The cost functional in Definition 3.9 represent a Bolza problem. Note that the neigher the end-time at $t_{2}$ nor the end-point $x (t_{2})$ are specified and our search space of functions is in some sense larger. Our search space is defined as $𝒞_{1} {[t_{1}, T]}^{n_{x}}$ for sufficiently large $T$ so that it can include $t_{2}$ . We need to derive additional conditions that an extremal must satisfy. These conditions should allow us to find an equation for the end-point $x (t_{2})$ and the end-time $t_{2}$ at which this end-point is reached. Informally, we are then looking for $n_{x} + 1$ additional equations to derive. $n_{x}$ equations are needed for $x (t_{2})$ while one equation for the final time $t_{2}$ .

We give without proof an important theorem that is instrumental to derive optimality conditions for this problem.

Theorem 3.6: of Leibniz

\frac{d}{d α} \int_{t_{1}}^{h (α)} f (t, α) d t = \int_{t_{1}}^{h (α)} f_{α} (t, α) d t + h_{α} f (h (α), α)

Theorem 3.7: Necessary conditions for free end-point and end-time problems

Consider the problem in Definition 3.9. If $x ⋆$ is a local minimum point then $x ⋆$ must satisfy:

\begin{aligned} \frac{\partial f [x}{⋆}] \partial x ⊤ - \frac{d}{d t} \frac{\partial f [x}{⋆}] \partial \dot{x} ⊤ = 0 \forall t \in [t_{1}, t_{2}^{⋆}] \\ \frac{\partial f [x}{⋆} (t_{2}^{⋆})] \partial \dot{x} ⊤ + \frac{\partial ϕ (t_{2}^{⋆}, x}{⋆} (t_{2}^{⋆})) \partial x ⊤ = 0 \\ H (t_{2}^{⋆}, x ⋆ (t_{2}^{⋆}), {\dot{x}}^{⋆} (t_{2}^{⋆})) + \frac{\partial ϕ (t_{2}^{⋆}, x (t_{2}^{⋆}))}{\partial t} = 0 . \end{aligned}

Proof. Recall that the geometric necessary condition in Theorem 3.1 does not assume any specific form for the functional $F$ . Hence, a necessary condition for a local minimum of a free end-point problem is again $δ F_{x ⋆} [ξ] = 0$ for all $𝒟$ -admissible direction $ξ$ . Note that in order to be a $𝒟$ -admissible direction for this problem $ξ \in C^{1} {[t_{1}, t_{2}]}^{n_{x}}$ must be such that $ξ (t_{1}) = 0$ . We highlight again the fact that $t_{2}$ is not given and that $ξ$ must not satisfy any boundary condition at $t_{2}$ . Since $t_{2}$ is not fixed we need to allow variations of the end-time $t_{2}$ around the optimal time $t_{2}^{⋆}$ , thus we consider end-time variations of the form ${\tilde{t}}_{2} = t_{2}^{⋆} + α τ$ . It is essential to notice that the modulating parameter $α$ is the same for variation of the end-time and of the trajectory $ξ$ . With this assumption we do not loose generality since $τ$ and $ξ$ are arbitrary. The first variation of the Bolza-type functional is:

\begin{aligned} δ F_{x} [ξ] = (\frac{\partial}{\partial α} \int_{t_{1}}^{t_{2} + α τ} f (t, x (t) + α ξ (t), \dot{x} (t) + α \dot{ξ} (t)) d t) |_{α = 0} + \\ + (\frac{\partial}{\partial α} ϕ (t_{2} + α τ, x (t_{2} + α τ) + α ξ (t_{2} + α τ))) |_{α = 0}, \end{aligned}

we carry out the derivative term by term. Note that in the first term we can apply Theorem 3.6. Thus we have:

\begin{aligned} (\frac{\partial}{\partial α} \int_{t_{1}}^{t_{2} + α τ} f (t, x (t) + α ξ (t), \dot{x} (t) + α \dot{ξ} (t)) d t) |_{α = 0} = \\ (\int_{t_{1}}^{t_{2} + α τ} (\frac{\partial f [x + α ξ]}{\partial x} ξ + \frac{\partial f [x + α ξ]}{\partial \dot{x}} \dot{ξ} d t) + \\ + τ f [x (t_{2} + α τ) + α ξ (t_{2} + α τ)]) |_{α = 0} = \\ = \int_{t_{1}}^{t_{2}} (\frac{\partial f [x]}{\partial x} ξ + \frac{\partial f [x]}{\partial \dot{x}} \dot{ξ}) d t + τ f [x (t_{2})], \end{aligned}

again we have used to compressed notation $f [x (t_{2} + α τ) + α ξ (t_{2} + α τ)] = f (t_{2}, x (t_{2} + α τ) + α ξ (t_{2} + α τ), \dot{x} (t_{2} + α τ) + α \dot{ξ} (t_{2} + α τ))$ . Using integration by parts and the boundary condition $ξ (t_{1}) = 0$ we have:

\begin{aligned} \int_{t_{1}}^{t_{2}} (\frac{\partial f [x]}{\partial x} ξ + \frac{\partial f [x]}{\partial \dot{x}} \dot{ξ}) d t = \\ = \int_{t_{1}}^{t_{2}} (\frac{\partial f [x]}{\partial x} - \frac{d}{d t} \frac{\partial f [x]}{\partial \dot{x}}) ξ d t + \frac{\partial f [x (t_{2})]}{\partial \dot{x}} ξ (t_{2}), \end{aligned}

while the second term in the expression of the first variation is:

\begin{aligned} (\frac{\partial}{\partial α} ϕ (t_{2} + α τ, x (t_{2} + α τ) + α ξ (t_{2} + α τ))) |_{α = 0} = \\ = \frac{\partial ϕ (t_{2}, x (t_{2}))}{\partial t} τ + \frac{\partial ϕ (t_{2}, x (t_{2}))}{\partial x} [\dot{x} (t_{2}) τ + ξ (t_{2})], \end{aligned}

the geometric necessary condition for optimality is therefore:

\begin{aligned} δ F_{x ⋆} [ξ] = \int_{t_{1}}^{t_{2}} (\frac{\partial f [x}{⋆} - \frac{d}{d t} \frac{\partial f [x}{⋆}) ξ d t + \\ + \frac{\partial f [x}{⋆} ξ (t_{2}^{⋆}) + \frac{\partial ϕ (t_{2}^{⋆}, x}{⋆} τ + \\ \frac{\partial ϕ (t_{2}^{⋆}, x}{⋆} [{\dot{x}}^{⋆} (t_{2}^{⋆}) τ + ξ (t_{2}^{⋆})] + τ f [x ⋆ (t_{2}^{⋆})] = 0 \forall 𝒟 -admissible ξ, τ \in ℝ, \end{aligned}

since the necessary condition must hold along any $𝒟$ -admissible direction $ξ$ and final endpoint variation $τ$ , the underlying rationale is to choose variations that allow us to derive a set of necessary conditions in the form of differential equations or boundary conditions. First we consider a family of variations $𝒟_{1}$ such that $τ = 0$ and $ξ$ such that also at the second endpoint holds $ξ (t_{2}^{⋆}) = 0$ . Note that these variations are indeed $𝒟$ -admissible. Therefore we have:

δ F_{x ⋆} [ξ] = \int_{t_{1}}^{t_{2}} (\frac{\partial f [x}{⋆} - \frac{d}{d t} \frac{\partial f [x}{⋆}) ξ d t = 0 \forall ξ \in 𝒟_{1}

Note that the previous equation is exactly the equation from which we derived the Euler necessary condition. We conclude that Euler equation is a necessary condition and must hold at a stationary point also in this case.

\frac{\partial f [x}{⋆}] \partial x ⊤ - \frac{d}{d t} \frac{\partial f [x}{⋆}] \partial \dot{x} ⊤ = 0 \forall t \in [t_{1}, t_{2}^{⋆}]

Note that $t_{2}^{⋆}$ is yet to be determined. Now we consider a second family of $𝒟$ -admissible directions $𝒟_{2}$ such that $τ = 0$ but $ξ (t_{2}^{⋆})$ is free to vary. Since Euler equation is satisfied we have:

δ F_{x ⋆} [ξ] = \frac{\partial f [x}{⋆} ξ (t_{2}^{⋆}) + \frac{\partial ϕ (t_{2}^{⋆}, x}{⋆} ξ (t_{2}^{⋆}) = 0 \forall ξ (t_{2}^{⋆}) \in 𝒟_{2},

we can conclude that:

\frac{\partial f [x}{⋆} (t_{2})] \partial \dot{x} ⊤ + \frac{\partial ϕ (t_{2}^{⋆}, x}{⋆} (t_{2}^{⋆})) \partial x ⊤ = 0,

this is a set of $n_{x}$ boundary conditions at $t_{2}^{⋆}$ that implicitely defines $x ⋆ (t_{2}^{⋆})$ . These conditions are called transversality conditions. Finally we use a family of variations $𝒟_{3}$ that nullifies the term $[ξ (t_{2}^{⋆}) + τ {\dot{x}}^{⋆} (t_{2}^{⋆})]$ , hence variations that satisfies $ξ (t_{2}^{⋆}) = - τ {\dot{x}}^{⋆} (t_{2}^{⋆})$ while $τ \in ℝ$ is left free to vary. Again, Euler equation is satisfied and thus we have:

δ F_{x ⋆} [ξ] = (f [x ⋆ (t_{2}^{⋆})] - \frac{\partial f [x}{⋆} {\dot{x}}^{⋆} (t_{2}^{⋆}) + \frac{\partial ϕ (t_{2}^{⋆}, x}{⋆}) τ = 0 \forall τ \in ℝ .

Therefore we have:

f [x ⋆ (t_{2}^{⋆})] - \frac{\partial f [x}{⋆} {\dot{x}}^{⋆} (t_{2}^{⋆}) + \frac{\partial ϕ (t_{2}^{⋆}, x (t_{2}^{⋆}))}{\partial t} = 0 .

That is a scalar equation that implicitely defines $t_{2}^{⋆}$ . Note that the first two terms are precisely the definition of the Hamiltonian evaluated at the final optimal instant. The condition can be rewritten as

H (t_{2}^{⋆}, x ⋆ (t_{2}^{⋆}), {\dot{x}}^{⋆} (t_{2}^{⋆}) + \frac{\partial ϕ (t_{2}^{⋆}, x (t_{2}^{⋆}))}{\partial t} = 0

□

Remark 3.14

A similar reasoning holds for more general Bolza problems where the function $ϕ$ depends also on the initial time, that is $ϕ (t_{1}, x (t_{1}), t_{2}, x (t_{2}))$ . In this case, neither the initial point $x (t_{1})$ nor the initial time $t_{1}$ are fixed and thence the $𝒟$ -admissible directions need not to satisfy $ξ (t_{1}) = 0$ . In this case, the transversality condition and the end-point condition of the Hamiltonian must hold also at the initial time.

Remark 3.15

The free end-point problem of Lagrange ( i.e. when $ϕ \equiv 0$ ) can be handled in the same way, applying the necessary conditions of Theorem 3.7 yields the so called natural boundary conditions:

\begin{aligned} H (t_{2}^{⋆}, x ⋆ (t_{2}^{⋆}), {\dot{x}}^{⋆} (t_{2}^{⋆})) = 0 \\ \frac{\partial f [x}{⋆} (t_{2}^{⋆})] \partial \dot{x} ⊤ = 0, \end{aligned}

these boundary conditions must hold at a stationary point together with the Euler equations. Note that if $t_{2}$ is fixed while $x (t_{2})$ is left free only the second equation must hold and vice versa.

Example 3.11. Consider the problem to minimize the functional:

\begin{aligned} \min_{x (t) \in 𝒟} & F [x] = \int_{t_{1}}^{t_{2}} \frac{1}{2} ẋ^{2} d t + \frac{1}{2} p {(x (t_{2}) - x_{2})}^{2} \\ s . t . & x \in 𝒟 = {x \in 𝒞^{1} [t_{1}, t_{2}] such that x (t_{1}) = x_{1}} \end{aligned}

in which the final time $t_{2}$ is fixed but not the endpoint $x (t_{2})$ . $p \in ℝ^{+}$ is a scalar weight. In this case $ϕ (t_{2}, x (t_{2})) = \frac{1}{2} p {(x (t_{2}) - x_{2})}^{2}$ is a strongly convex nonnegative function that penalizes the distance of the final endpoint from the given target $x_{2}$ . We highlight again the fact that $x (t_{2})$ is not given and it must be found using the transversality condition. The term under the integral sign is a strongly convex function that penalizes high value of the derivative of $x$ . Informally, given an initial point $x (t_{1}) = x_{1}$ we want to find the curve that reaches a trade-off between having a small derivative along its trajectory and approaching as close as possible the target $x_{2}$ . Obviously the relative weight of these objectives depend on the magnitude of $p$ . For the sake of simplicity, let’s consider $t_{1} = 0$ . The Euler necessary condition reads:

ẍ = 0 \to x^{⋆} (t) = x_{1} + A t

note that we can only assign the initial boundary condition $x (t_{1}) = x_{1}$ . The transversality condition reads:

ẋ^{⋆} (t_{2}) + p (x^{⋆} (t_{2}) - x_{2}) = 0 \to A + p (x_{1} + A t_{2} - x_{2}) = 0 \to A = \frac{p (x_{2} - x_{1})}{1 + p t_{2}} .

Therefore the extremal trajectory is:

x^{⋆} (t) = x_{1} + \frac{p (x_{2} - x_{1})}{1 + p t_{2}} t,

note that as $p \to \infty$ we have $x^{⋆} (t) \to x_{1} + \frac{x_{2} - x_{1}}{t_{2}} t$ that is $x^{⋆} (t_{2}) \to x_{2}$ in the limit of an infinite weight $p$ . This would be the solution of the Lagrange problem with the same lagrangian and $ϕ \equiv 0$ and the endpoint $x (t_{2}) = x_{2}$ fixed. On the other hand, for a vanishing weight $p$ (i.e. $p \to 0$ we have that $x^{⋆} (t) \to x_{1}$ tends to a constant value throughout the trajectory. Indeed a constant trajectory is the global minimum ⁵ for the functional $F [x] = \int_{t_{1}}^{t_{2}} \frac{1}{2} ẋ^{2} d t$ where only the initial point is fixed. The extremal trajectories for a finite $p$ are straight lines in the interval $[0, t_{2}]$ that reach the optimal trade-off between minimizing the value of the derivative and reaching the target. A simple solution with $t_{2} = 1$ and $x_{1} = 1$ , $x_{2} = 2$ is shown in Figure 3.7. In this simple case we can also compute analytically the optimal value of the cost functional as a function of the weight $p$ . Since $ẋ^{⋆} = \frac{p}{1 + p}$ and $x^{⋆} (t_{2}) = 1 + \frac{p}{1 + p}$ , we have:

F [x^{⋆}] = \int_{0}^{1} \frac{1}{2} (\frac{p}{1 + p})^{2} d t + \frac{1}{2} p (1 + \frac{p}{1 + p} - 2)^{2} = \frac{p}{2 (1 + p)}