Home Laboratory Computer Lab Theses

3.3 Optimality conditions for weak extrema

In this section, we will derive a set of necessary optimality conditions that a trajectory x must satisfy in order to be a candidate solution for a weak local minimum of a cost functional F[x] . Note that, if we choose 𝒳=𝒞1 that is we work with once continuously differentiable functions, every strong extremum 2 2 What we mean by ”extremum” will be made clear later on is automatically a weak extremum, as a consequence the set of necessary conditions valid for weak extrema are necessary also for strong extrema. We start considering the basic problem of calculus of variations according to Definition 3.1 in which the admissible set is:

𝒟:={x𝒞1([t1,t2])nx:x(t1)=x1,x(t2)=x2},

that is we are looking for a minimizing trajectory in the set of once continuously differentiable vector-valued functions that have given values at the endpoints. Recall that this is a free problem of the calculus of variations.

Definition 3.4: Perturbed functions

Given a function x0𝒟 we define a family of perturbed functions as:

x(t)=x0(t)+αξ(t),

where α is a small real parameter and ξ(t)𝒞1([t1,t2])nx are functions such that ξ(t1)=ξ(t2)=0 .

Remark 3.3

Note that by choosing α sufficiently small we can make x0 and x=x0+αξ arbitrarily close in the sense of the 1, . We have that:

xx01,=maxt1tt2αξ(t)+maxt1tt2αξ˙(t)=|α|ξ1,.

A pictorial example of perturbed function for the scalar case is given in Figure 3.4

pict

Figure 3.4:: Perturbed function

We now give a fundamental definition that can be seen as the generalization of the directional derivative to infinite-dimensional function spaces.

Definition 3.5: First Variation of a Functional, Gâteaux Derivative

Let F be a functional defined on a function space 𝒳 . Then, the first variation of F at x𝒳 in the direction ξ𝒳 also called Gâteaux derivative with respect to ξ at x , is defined as:

δFx[ξ]=limα0F[x+αξ]F[x]α=αF[x+αξ]|α=0,

(provided it exists). If the limit exists for all ξ𝒳 , then F is said to be Gâteaux differentiable at x

Remark 3.4

Note that for fixed x and ξ the functional F[x+αξ] is actually a real-valued function of α that is g(α)F[x+αξ] . Then the Gâteaux derivative is simply the first derivative of the function g with respect to α computed in α=0 , i.e. δFx[ξ]=dgdα|0

Remark 3.5

Note that the Gâteaux derivative δFx[ξ] need not exist in any direction ξ0 , or it may exist is some directions and not in others. Its existence presupposes that:

  • F[x] is defined,
  • F[x+αξ] is defined for all sufficiently small α .

Then:

δFx[δx]=αF[x+αδx]|α=0=dgdα|0

is defined only if this ”ordinary” derivative with respect to the real variable α exists at α=0 . If the Gâteaux derivative exists, then it is unique.

Example 3.6 (Calculation of a Gâteaux Derivative). Consider the functionalF[x]=abx(t)2dt . In order to take the Gâteaux derivative we can defineg(α):=F[x+αξ] and then directly differentiate it. That is:

δFx[ξ]=ddαg(α)|α=0=[ddαab(x(t)+αξ(t))2dt]|α=0==[abddα(x(t)+αξ(t))2dt]|α=0==[ab2(x(t)+αξ(t))ξ(t)dt]|α=0=ab2x(t)ξ(t)dt,

on the other hand, we can directly take the limit:

δFx[ξ]=limα0F[x+αξ]F[x]α==limα0ab(x(t)+αξ(t))2dtabx(t)2dtα==limα0ab2αx(t)ξ(t)+α2ξ(t)2dtα=ab2x(t)ξ(t)dt.

Example 3.7 (Non-Existence of a Gâteaux Derivative). Consider the functionalF[x]:=01|x(t)|dt . Clearly,F[x] is defined on all𝒞1[0,1] for each continuous functionx𝒞1[0,1] results in a continuous integrand|x(t)| whose integral is finite. Forx0(t):=0 andξ(t):=t we have:

F[x0+αξ]=01|αt|dt,

therefore:

F[x0+αξ]F[x0]α={12 if α>012 if α<0

and a Gâteaux derivative does not exist atx0 in the directionξ .

Lemma 3.1: Descent Direction

Let F be a functional defined on a normed linear space (𝒳,) (e.g a function space). Suppose that F has a strictly negative variation δFx¯[ξ]<0 at a point x¯𝒳 in some direction ξ𝒳 . Then, x¯ cannot be a local minimum point for F (in the sense of the norm )

Proof. Since δFx¯[ξ]<0

δ>0 such that F[x¯+αξ]F[x¯]α<0αδ(0),

hence,

F[x¯+αξ]<F[x¯],α(0,δ)

since x¯+αξx¯=|α|ξ0 as α0+, the points x¯+αξ are eventually in each neighborhood of x¯ , irrespective of the norm considered on 𝒳 . Thus, local minimum behavior of F , in the sense of Definition 3.3 is not possible in the direction ξ at x¯ . □

Definition 3.6:𝒟 -Admissible Directions

Let F be a functional defined on a subset 𝒟 of a linear space 𝒳 and let x¯𝒟 . Then, a direction ξ𝒳,ξ0 , is said to be 𝒟 -admissible at x¯ for F if:

  • δFx¯[ξ] exists.
  • x¯+αξ𝒟 for all sufficiently small α that is:
    δ>0 such that x¯+αξ𝒟,αδ(0)

Remark 3.6

Note that if ξ is a 𝒟 -admissible direction then also the parametrized family of directions aξ for all a is a 𝒟 -admissible direction. This simple fact follows directly from the definition of 𝒟 -admissible directions.

Remark 3.7: Properties of First Varations

It is instrumental to highlight some properties of the first variation of a functional:
Linearity. Let F1 and F2 be two functionals defined on 𝒟 and ξ a 𝒟 -admissible direction. Consider a linear combination of the two functionals F~=a1F1+a2F2 a a IndeedF~ is still a functional. Then δF~x[ξ]=a1F1,x[ξ]+a2F2,x[ξ] . This linearity property is inherited by the linearity property of the ordinary derivative.
Homogeneity. Let F be a functional defined on 𝒟 and aξa a family of 𝒟 -admissible directions, then δFx[aξ]=aδFx[ξ]a . In other words, the first variation is a homogeneous operator.

Note also that in general the first variation is not a linear operator in ξ that is δFx[a1ξ1+a2ξ2]a1δFx[ξ1]+a2δFx[ξ2] for a1,a2 and ξ1,ξ2 𝒟 -admissible directions although it may be true in some special cases. We will require linear Gâteaux derivatives to prove some results of inequality constrained problems.

Theorem 3.1: Geometric Necessary Conditions for a Local Minimum

Let F be a functional defined on a subset 𝒟 of a normed linear space (𝒳,) . Suppose that x𝒟 is a local minimum point for F on 𝒟 . Then:

δFx[ξ]=0for each𝒟-admissible directionξ at x.

Proof. By contradiction, suppose that there exists a 𝒟 -admissible direction ξ such that δFx[ξ]<0 . Then, by Lemma 3.1, x cannot be a local minimum for F . Likewise, there cannot be a 𝒟 -admissible direction ξ such that δFx[ξ]>0 . Indeed, ξ being 𝒟 -admissible and δFx[ξ]=δFx[ξ]<0 by Lemma 3.1, x cannot be a local minimum. Overall, we must have that δFx[ξ]=0 for each 𝒟 -admissible direction ξ at x . □

Remark 3.8

Note that Theorem 3.1 can be interpreted as the usual necessary condition for optimality for unconstrained NLP problems. Let’s define g(α):=F[x+αξ] . Note that the scalar function g(α) is the value of the functional when perturbed around a minimizing trajectory x that is we are assuming that F[x] is a local minimum. Equivalently, we are stating that g(0)F[x] is a local minimum for g(α) . Therefore a necessary condition for g(0) to be a local minimum is dgdα|α=0=0 . But for the definition of g we also have:

dgdα|α=0=δFx[ξ]=0,

where the last statement must hold for all admissible directions ξ𝒟 .

We now state some important results needed for the derivations of the necessary conditions of optimality.

Lemma 3.2

Let h𝒞1[t1,t2] and ξ𝒟={ξ𝒞1[t1,t2]such thatξ(t1)=ξ(t2)=0} Suppose that:

t1t2h(t)ξ(t)dt=0ξ𝒟

then h(t)0t[t1,t2]

Proof. By contradiction suppose there is a t~[t1,t2] such that h(t~)0 , say h(t~)>0 a a the same reasoning forh(t~)<0 is analogous and is left to the reader as an exercise. By continuity of h there exist a ball around t~ such that h is always positive, formally δ>0such thath(t)>0tδ(t~) . Since ξ is arbitrary we can choose a function that is 0 everywhere but in δ(t~) where we choose it to be a positive continuous and differentiable function tδ(t~) then, for such a ξ , we have:

t1t2h(t)ξ(t)dt=δ(t~)h(t)ξ(t)dt>0

since both h(t) and ξ(t) are positive for tδ(t~) and thus we reach a contradiction. Hence, h(t)=0t[t1,t2] . □

Remark 3.9

Note that with simple modifications Lemma 3.2 to the case:

t1t2h(t)ξ(t)dt=0ξ𝒟

Where h𝒞[t1,t2]nx and ξ𝒟={ξ[t1,t2]such thatξ(t1)=ξ(t2)=0} . We recover the case proved by the Lemma 3.2 by rewriting the previous equation as:

t1t2h(t)ξ(t)dt=t1t2i=1nxhi(t)ξi(t)dt=0ξ𝒟

Then it is sufficient to consider functions ξi sucht that ξj=0ji and apply the results of Lemma 3.2 for each equation t1t2hi(t)ξi(t)dt=0 . In this way we obtain that the vector-valued function h=0 for each t[t1,t2] .

We state another standard instrumental theorem without proof.

Theorem 3.2: Differentiation Under the Integral Sign

Let f:× such that f:=f(t,α), be a continuous function with continuous partial derivative fα=fα on [t1,t2]×[α1,α2]. Then

g(α):=t1t2f(t,α)dt

is in 𝒞1[α1,α2], with the derivative

ddαg(α)=ddαt1t2f(t,α)dt=t1t2fαdt.

Remark 3.9 and Theorem 3.2 are used to obtain a set of necessary conditions for optimality known as Euler equations.

Theorem 3.3: Euler’s Necessary Conditions for Optimality

Consider the problem to minimize the functional:

minx(t)F[x]=t1t2f(t,x(t),x˙(t))dts.t.x𝒟

where the functional space is defined as 𝒟={x𝒞1[t1,t2]such thatx(t1)=x1x(t2)=x2} and f:×nx×nx a continuously differentiable function. Suppose that x𝒟 gives a (local) minimum for F on 𝒟 . Then x satisfies the Euler equations:

ddtf(t,x(t),x˙(t))x˙=f(t,x(t),x˙(t))x

Proof. We start by computing the first variation of the functional F at the minimizing trajectory x in a 𝒟 -admissible direction ξ :

δFx[ξ]=αF[x+αξ]|α=0==[t1t2{f(t,x+αξ,x˙+αξ˙)x˙ξ˙+f(t,x+αξ,x˙+αξ˙)xξ}dt]|α=0==t1t2{f(t,x,x˙)x˙ξ˙+f(t,x,x˙)xξ}dt.

By Theorem 3.1 we must have:

δFx[ξ]=t1t2{f(t,x,x˙)x˙ξ˙+f(t,x,x˙)xξ}dt=0𝒟-admissibleξ.

By Definition 3.6, 𝒟 -admissible directions are 𝒞1[t1,t2] functions such that ξ(t1)=ξ(t2)=0 . Note that in this way we have x+αξ𝒟 . We modify the expression of the first variation in order to apply Lemma 3.2. Using integration by parts we have:

t1t2f(t,x,x˙)x˙ξ˙dt=[f(t,x,x˙)x˙ξ]|t1t2t1t2ddtf(t,x,x˙)x˙ξdt

but since ξ is a 𝒟 -admissible direction ξ(t1)=ξ(t2)=0 we have:

t1t2f(t,x,x˙)x˙ξ˙dt=t1t2ddtf(t,x,x˙)x˙ξdt.

The necessary condition for optimality then becomes:

δFx[ξ]=t1t2[f(t,x,x˙)xddtf(t,x,x˙)x˙]ξ=0𝒟-admissibleξ.

The last equation satsifies the conditions of Lemma 3.2, therefore we must have:

f(t,x,x˙)xddtf(t,x,x˙)x˙=0t[t1,t2]

Definition 3.7: Stationary Function

Each 𝒞1 function x¯ that satisfies the Euler equations on some interval is called a stationary function for the Lagrangian f .

Remark 3.10

Note that the Euler equations are necessary conditions for optimality therefore a stationary point x¯ can be a minimum, a maximum or none of them. In the literature stationary points are alse called extrema or extremal trajectories.

Example 3.8 (Simplest calculus of variations problem). Given two points in the planeA(x1,y1) andB(x2,y2) , find the curve of minimum length that joins them. A schematic picture is drawn in Figure 3.5. Obviously, the solution to this problem is a line joiningA andB but we will work out the solution using the Euler equation. The functional that we want to minimize is the length of the curve, that is we want to solve the problem:

miny(x)F[y]=x1x2f(x,y(x),y(x))dx=x1x2ds=x1x21+y(x)2dxs.t.y𝒟={y𝒞1[x1,x2]such thaty(x1)=y1y(x2)=y2}

note that with respect to the previous notationx is the independent variable and we are seeking for a functiony(x) . We have also indicatedy(x)=dydx . The lagrangian for this problem isf=1+y(x)2 . Note that it is independent ofy(x) . The Euler equation for this problem reads:

ddxfy=fy=0

As a consequence:

ddxfy=ddxy1+y(x)2=y(x)(1+y2)3=0

Since the denominator is always different from zero we havey(x)=0 , therefore stationary functions have zero second-derivative. We can then integrate this condition to derivey :

y(x)=C1x+C2

Finally imposing thaty(x)𝒟 , that is imposing the boundary conditionsy(x1)=y1 andy(x2)=y2 we obtain:

y(x)=y1+y2y1x2x1(xx1)

pict

Figure 3.5:: Minimum length problem

Remark 3.11

Depending on the structure of the lagrangian function f , Euler equations can be simplified.

f does not explicitly depend on the functionx .
Suppose f=f(t,x˙) , then the Euler equations becomes ddtfx˙=0 .

f does not explicitly depend on the functionx˙ .
Suppose f=f(t,x) then the Euler equations becomes fx=0 . This is a degenerate case.

f does not explicitly depend on the independent variablet .
Suppose f=f(x,x˙) , we define the Hamiltonian function H(x,x˙)=ffx˙x˙ . If x¯ is a stationary point then the Hamiltonian is constant. The total time-derivative of the Hamiltonian is:

ddtH=ddt(ffx˙x˙)==fxx˙+fx˙x¨(ddtfx˙)x˙fx˙x¨==(fxddtfx˙)x˙=0

Therefore an equivalent necessary condition is H=ffx˙x˙=C . Where C is a constant. This is also known as Beltrami Identity.

Example 3.9 (A more challenging problem: The Brachistochrone). Recall the Brachistochrone Problem 3.1:

minxF[x]=ξAξB1+(ξ)2vA22g(x(ξ)xA)dξs.t.x𝒟={x𝒞1[ξ1,ξ2]such thatx(ξ1)=xAx(ξ2)=xB}3.

For the sake of simplicity, we consider a slightly simplified problem. Assume that the initial velocityvA=0 and the initial point is the origin, that isA=(0,0) . The lagrangian for this problem isf=1+(ξ)22gx(ξ) . Note that the lagrangian is independent ofξ , hence the Hamiltonian is stationary. Therefore, we have:

H=ff=1+(ξ)22gx(ξ)2x(1+2)=12g1+22x(1+2=C0

Which is equivalent to:

x(1+2)=2C1

for a new constantC1=1gC02 . Thus we have:

=2C1xx

hat is equivalent to:

dξ=x2C1xdx

In order to solve the previous integral we make the substitutionx=C1(1+cos(2ψ)) , thereforedx=2C1sin(2ψ)dψ The integral becomes:

xAxx2C1xdx=ψAψC1(1+cos(2ψ)2C1C1(1+cos(2ψ))2C1sin(2ψ)dψ==2C1ψAψ(1+cos(2ψ)sin2(2ψ)(1cos(2ψ))dψ==2C1ψAψ(1+cos(2ψ)(1cos2(2ψ))(1cos(2ψ))dψ==2C1ψAψ(1+cos(2ψ)dψ=aC1(2ψ+sin(2ψ))

wherea is a constant to be determined. Overall we have obtained a curve in the parameterψ that is:

x=C1(1+cos(2ψ))ξ=aC1(2ψ+sin(2ψ).

This is the parametric form of a family of cycloids depending on the constantsC1 (that in turn depends onC0 ) anda . The first boundary condition allows to easily finda in terms ofC1 . That is:

xA=0=C1(1+cos(2ψA))ψA=π2ξA=0=aC1[2ψA+sin(2ψA]a=C1π

Thus, we obtain a smaller family of cycloids passing through the origin as shown in Figure 3.6whose equations are:

x=C1(1+cos(2ψ))ξ=C1(π2ψsin(2ψ))

note that these are alle minimum time curves. The second boundary condition is somewhat more involved and is solved numerically, the nonlinear system is:

xB=C1(1+cos(2ψB))ξB=C1(π2ψsin(2ψB))

The solution joining pointsA=(0,0) andB=(6,5) is shown in red in Figure 3.6. The constantC1=2.62 .

pict

Figure 3.6:: Minimum time curves originating from point A in blue. Minimum time curve joining points A and B in red.

Remark 3.12: Hamilton’s Interpretation of Lagrangian Mechanics

According to the principle of least action, a system in motion will always follow the trajectory that minimize the action functional:

A[q]=t1t2L(q,q˙)dt=t1t2(T(q,q˙)V(q))dt

where q𝒞1[t1,t2]n are the generalized lagrangian coordinates of an n degrees of freedom system. T(q,q˙) is the kinetic energy while V(q) is the potential energy. The Euler necessary condition reads:

q(TV)=ddtx˙(TV)=0ddtTq˙Tq+Vq=0

which is the usual Lagrange equation of analytical mechanics. In view of this mechanical interpretation recall the definition of Hamiltonian H=ffx˙x˙ . In general the kinetic energy have the form T(q,q˙)=12q˙M(q)q˙ . In this case the Hamiltonian reads:

H=TV(TV)q˙q˙=(T+V)=E

where E is the total energy of the system. According to Remark 3.11 if the lagrangian is independent of time t the Hamiltonian is constant along an extremal trajectory. In other words the total energy of the system is conserved. Finally, recall that the momentum m of a mechanical system can be defined as m=Tq˙ , hence the special case when the lagrangian is independent of x in Remark 3.11 can be viewed as a momentum conservation that is ddtTq˙=ddtm=0 .

Analogously to NLP problems, also for problems of the calculus of variations we can derive a second-order necessary condition. Recall that in the finite-dimensional NLP case the second-order necessary condition was the semi-positive definiteness of the Hessian matrix that is equivalent to the objective function being locally convex in the neighbourhood of a stationary point. Similar conditions hold in the infinite-dimensional case. First, we need the following definition:

Definition 3.8: Second Variation of a Functional

Let F be a functional defined on a function space 𝒳 . Then, the second variation of F at x𝒳 in the direction ξ𝒳 is defined as:

δ2Fx[ξ]=2α2F[x+αξ]|α=0

(provided it exists).

Theorem 3.4: Second-Order Necessary Conditions (Legendre)

Consider the problem to minimize the functional:

minx(t)F[x]=t1t2f(t,x(t),x˙(t))dts.t.x𝒟

where the functional space is defined as 𝒟={x𝒞1[t1,t2]nx such that x(t1)=x1x(t2)=x2} and f:×nx×nx a continuously differentiable function. Suppose that x𝒟 gives a (local) minimum for F on 𝒟 . Then x satisfies the Euler equation along with the so-called Legendre condition:

2fx˙20t[t1,t2].

Proof. Based on the differentiability properties of Theorem 3.2 we can compute the second variation as:

δ2Fx[ξ]=2α2F[x+αξ]|α=0=t1t22α2f[x+αξ]dt=t1t2(ξfxx[x+αξ]ξ+2ξfxx˙[x+αξ]ξ˙+ξ˙fx˙x˙[x+αξ]ξ˙)dt

where we have used the compressed notation f[x+αξ]:=f(t,x(t)+αξ(t),x˙(t)+αξ˙(t)) . Substituing α=0 gives:

δ2Fx[ξ]=t1t2(ξfxx[x+αξ]ξ++2ξfxx˙[x+αξ]ξ˙+ξ˙fxx[x+αξ]ξ˙)dt.

Define the real-valued function of α , g(α):=F[x+αξ] . We have that α=0 satisfies the necessary conditions for a local minimum of an unconstrained optimization problem, since g(0)=F[x] is a local minimum for the functional F . Therefore dgdα|0=0 and d2gd2α|00 . Note that d2gdα2|0=2α2F[x+αξ]|α=0=δ2Fx[ξ] . Hence we have:

d2gdα2|0=δ2Fx[ξ]=t1t2(ξfxx[x+αξ]ξ++2ξfx˙x[x+αξ]ξ˙+ξ˙fxx[x+αξ]ξ˙)dt0.

The last equation must hold true for each perturbation ξ𝒞1[t1,t2] such that ξ(t1)=ξ(t2)=0 . We use Einstein’s notation to imply the double summation, that is:

ξAξ=i=1nxj=1nxai,jξiξjai,jξiξj

to simplify the notation let’s define A:=fx˙x[x] . Note that due to Schwarz’s theorem A is symmetric. Using integration by parts we have:

t1t2ai,jξiξ˙jdt=(ai,jξiξj)|t1t2t1t2(ȧi,jξiξj+ai,jξ˙iξj)dt

then, using the symmetry of A and the boundary conditions on ξ , we have:

t1t22ai,jξiξ˙jdt=t1t2ȧi,jξiξjdt

hence we have:

2t0t1ξfx˙x[x]ξ˙=t1t2ξ(ddtfx˙x[x])ξdt.

The necessary condition on the second variation becomes:

δ2Fx[ξ]=t1t2{ξ(fxx[x]ddtfx˙x[x])ξ+ξ˙fx˙x˙[x]ξ˙}dt0

by Lemma 3.3 a necessary condition on δ2Fx[ξ] to be nonnegative is:

fx˙x˙[x]=2fx˙20t[t1,t2]

Lemma 3.3

Let P(t) and Q(t) be given continuous (nx×nx) symmetric matrix functions on [t1,t2], and let the quadratic functional:

t1t2ξ(t)Q(t)ξ(t)+ξ˙(t)P(t)ξ˙(t)dt

be defined for all ξ𝒞1[t1,t2]nx such that ξ(t1)=ξ(t2)=0 . Then, a necessary condition for the quadratic functional to be nonnegative for all such ξ is that P(t)0 for each t[t1,t2] .

It would be tempting to extend the analogy with NLP problems and looking for a sufficient condition. Recall that a sufficient condition for a local minimum of an unconstrained NLP was the stationarity and the strict convexity of the Hessian at the candidate solution, see Theorem 2.8. Unfortunately, the requirements that x¯ shall be a stationary function for F and the lagrangian f(t,y,z) 4 strictly convex with respect to the third argument z are not sufficient for x¯ to be a (weak) local minimum of a free problem of the the calculus of variations. Additional conditions must hold, such as the absence of points conjugate to the point t1 in [t1,t2] , such condition is called Jacobi’s sufficient condition. We remind the reader to the vast literature about this topic. Instead we give a sufficient condition for a global minimum that relies on the lagrangian f(t,y,z) being jointly convex in the second and third argument y , z respectively. Unfortunately this condition is seldom satisfied in practice. Recall that for a continuously differentiable function j(x,y):Cxnx×Cyny , joint convexity means:

j(x,y)j(x0,y0)++xj(x0,y0)[xx0]+yj(x0,y0)[yy0]xCxandyCy

Theorem 3.5: Sufficient Conditions for a Weak Global Minimum

Consider the problem to minimize the functional:

minx(t)F[x]=t1t2f(t,x(t),x˙(t))dts.t.x𝒟

where the functional space is defined as 𝒟={x𝒞1[t1,t2]nx such that x(t1)=x1x(t2)=x2} and f:×nx×nx a continuously differentiable function. Suppose that the Lagrangian f(t,y,z) is [strictly] jointly convex in (y,z) . If x is a stationary function for the Lagrangian f a a It satifies the Euler necessary condtions then x is also a [strict] global minimizer for F on 𝒟 .

Proof.

F[x]F[x]=t1t2f(t,x,x˙)f(t,x,x˙)dtt1t2f(t,x,x˙)x[xx]+f(t,x,x˙)x˙[x˙x˙]dt=t1t2(f(t,x,x˙)xddtf(t,x,x˙)x˙)[xx]dt++(f(t,x,x˙)x˙[xx])|t1t2=0

where we have used joint convexity of f in the second and third argument. Subsequenty, we make use of integration by parts and the fact that x and x are equal at the boundary, that is x(t1)=x(t1) and x(t2)=x(t2) together with the hypothesis that x is a stationary point. Overall we have:

F[x]F[x]0x𝒟

which is the definition of global minimum of a functional. □

Example 3.10. Consider the problem P to minimize the functional

F[x]:=t1t22dt

on𝒟:={x𝒞1[0,1]:x(0)=0,x(1)=x1} . The lagrangianf(t,y,z)=z2 is convex inz and does not depend ony , then thanks to Theorem 3.5every stationary point is a global minimum. The Euler necessary condition for this case reduces to:

ddtf==0

therefore by double integration and imposing the boundary condition we obtain:

x(t)=tx1

that is a straight line joining the origin and the pointP=(1,x1) . The extremal functionx is the unique global minimum thanks to Theorem 3.5since it is the unique stationary point of the functional to be minimized.

3.3.1 Problems with free end points

We now consider slightly different and more involved problems. Consider the problem in the following definition:

Definition 3.9: Calculus of Variation Free End-point problem

Consider the Calculus of Variations problem with free end-point and end-time:

minx(t)F[x]=t1t2f(t,x(t),x˙(t))dt+ϕ(t2,x(t2))s.t.x𝒟={(x,t2)𝒞1[t1,T]nx×such thatx(t1)=x1}

Remark 3.13

The cost functional in Definition 3.9 represent a Bolza problem. Note that the neigher the end-time at t2 nor the end-point x(t2) are specified and our search space of functions is in some sense larger. Our search space is defined as 𝒞1[t1,T]nx for sufficiently large T so that it can include t2 . We need to derive additional conditions that an extremal must satisfy. These conditions should allow us to find an equation for the end-point x(t2) and the end-time t2 at which this end-point is reached. Informally, we are then looking for nx+1 additional equations to derive. nx equations are needed for x(t2) while one equation for the final time t2 .

We give without proof an important theorem that is instrumental to derive optimality conditions for this problem.

Theorem 3.6: of Leibniz

Let f:× such that f:=f(t,α), be a continuous function with continuous partial derivative fα=fα on [t1,t2]×[α1,α2] and hC1[α1,α2] . Then

ddαt1h(α)f(t,α)dt=t1h(α)fα(t,α)dt+hαf(h(α),α)

Theorem 3.7: Necessary conditions for free end-point and end-time problems

Consider the problem in Definition 3.9. If x is a local minimum point then x must satisfy:

f[x]xddtf[x]x˙=0t[t1,t2]f[x(t2)]x˙+ϕ(t2,x(t2))x=0H(t2,x(t2),x˙(t2))+ϕ(t2,x(t2))t=0.

Proof. Recall that the geometric necessary condition in Theorem 3.1 does not assume any specific form for the functional F . Hence, a necessary condition for a local minimum of a free end-point problem is again δFx[ξ]=0 for all 𝒟 -admissible direction ξ . Note that in order to be a 𝒟 -admissible direction for this problem ξC1[t1,t2]nx must be such that ξ(t1)=0 . We highlight again the fact that t2 is not given and that ξ must not satisfy any boundary condition at t2 . Since t2 is not fixed we need to allow variations of the end-time t2 around the optimal time t2 , thus we consider end-time variations of the form t~2=t2+ατ . It is essential to notice that the modulating parameter α is the same for variation of the end-time and of the trajectory ξ . With this assumption we do not loose generality since τ and ξ are arbitrary. The first variation of the Bolza-type functional is:

δFx[ξ]=(αt1t2+ατf(t,x(t)+αξ(t),x˙(t)+αξ˙(t))dt)|α=0++(αϕ(t2+ατ,x(t2+ατ)+αξ(t2+ατ)))|α=0,

we carry out the derivative term by term. Note that in the first term we can apply Theorem 3.6. Thus we have:

(αt1t2+ατf(t,x(t)+αξ(t),x˙(t)+αξ˙(t))dt)|α=0=(t1t2+ατ(f[x+αξ]xξ+f[x+αξ]x˙ξ˙dt)++τf[x(t2+ατ)+αξ(t2+ατ)])|α=0==t1t2(f[x]xξ+f[x]x˙ξ˙)dt+τf[x(t2)],

again we have used to compressed notation f[x(t2+ατ)+αξ(t2+ατ)]=f(t2,x(t2+ατ)+αξ(t2+ατ),x˙(t2+ατ)+αξ˙(t2+ατ)) . Using integration by parts and the boundary condition ξ(t1)=0 we have:

t1t2(f[x]xξ+f[x]x˙ξ˙)dt==t1t2(f[x]xddtf[x]x˙)ξdt+f[x(t2)]x˙ξ(t2),

while the second term in the expression of the first variation is:

(αϕ(t2+ατ,x(t2+ατ)+αξ(t2+ατ)))|α=0==ϕ(t2,x(t2))tτ+ϕ(t2,x(t2))x[x˙(t2)τ+ξ(t2)],

the geometric necessary condition for optimality is therefore:

δFx[ξ]=t1t2(f[x]xddtf[x]x˙)ξdt++f[x(t2)]x˙ξ(t2)+ϕ(t2,x(t2))tτ+ϕ(t2,x(t2))x[x˙(t2)τ+ξ(t2)]+τf[x(t2)]=0𝒟-admissibleξ,τ,

since the necessary condition must hold along any 𝒟 -admissible direction ξ and final endpoint variation τ , the underlying rationale is to choose variations that allow us to derive a set of necessary conditions in the form of differential equations or boundary conditions. First we consider a family of variations 𝒟1 such that τ=0 and ξ such that also at the second endpoint holds ξ(t2)=0 . Note that these variations are indeed 𝒟 -admissible. Therefore we have:

δFx[ξ]=t1t2(f[x]xddtf[x]x˙)ξdt=0ξ𝒟1

Note that the previous equation is exactly the equation from which we derived the Euler necessary condition. We conclude that Euler equation is a necessary condition and must hold at a stationary point also in this case.

f[x]xddtf[x]x˙=0t[t1,t2]

Note that t2 is yet to be determined. Now we consider a second family of 𝒟 -admissible directions 𝒟2 such that τ=0 but ξ(t2) is free to vary. Since Euler equation is satisfied we have:

δFx[ξ]=f[x(t2)]x˙ξ(t2)+ϕ(t2,x(t2))xξ(t2)=0ξ(t2)𝒟2,

we can conclude that:

f[x(t2)]x˙+ϕ(t2,x(t2))x=0,

this is a set of nx boundary conditions at t2 that implicitely defines x(t2) . These conditions are called transversality conditions. Finally we use a family of variations 𝒟3 that nullifies the term [ξ(t2)+τx˙(t2)] , hence variations that satisfies ξ(t2)=τx˙(t2) while τ is left free to vary. Again, Euler equation is satisfied and thus we have:

δFx[ξ]=(f[x(t2)]f[x(t2)]x˙x˙(t2)+ϕ(t2,x(t2))t)τ=0τ.

Therefore we have:

f[x(t2)]f[x(t2)]x˙x˙(t2)+ϕ(t2,x(t2))t=0.

That is a scalar equation that implicitely defines t2 . Note that the first two terms are precisely the definition of the Hamiltonian evaluated at the final optimal instant. The condition can be rewritten as

H(t2,x(t2),x˙(t2)+ϕ(t2,x(t2))t=0

Remark 3.14

A similar reasoning holds for more general Bolza problems where the function ϕ depends also on the initial time, that is ϕ(t1,x(t1),t2,x(t2)) . In this case, neither the initial point x(t1) nor the initial time t1 are fixed and thence the 𝒟 -admissible directions need not to satisfy ξ(t1)=0 . In this case, the transversality condition and the end-point condition of the Hamiltonian must hold also at the initial time.

Remark 3.15

The free end-point problem of Lagrange ( i.e. when ϕ0 ) can be handled in the same way, applying the necessary conditions of Theorem 3.7 yields the so called natural boundary conditions:

H(t2,x(t2),x˙(t2))=0f[x(t2)]x˙=0,

these boundary conditions must hold at a stationary point together with the Euler equations. Note that if t2 is fixed while x(t2) is left free only the second equation must hold and vice versa.

Example 3.11. Consider the problem to minimize the functional:

minx(t)𝒟F[x]=t1t2122dt+12p(x(t2)x2)2s.t.x𝒟={x𝒞1[t1,t2]such thatx(t1)=x1}

in which the final timet2 is fixed but not the endpointx(t2) .p+ is a scalar weight. In this caseϕ(t2,x(t2))=12p(x(t2)x2)2 is a strongly convex nonnegative function that penalizes the distance of the final endpoint from the given targetx2 . We highlight again the fact thatx(t2) is not given and it must be found using the transversality condition. The term under the integral sign is a strongly convex function that penalizes high value of the derivative ofx . Informally, given an initial pointx(t1)=x1 we want to find the curve that reaches a trade-off between having a small derivative along its trajectory and approaching as close as possible the targetx2 . Obviously the relative weight of these objectives depend on the magnitude ofp . For the sake of simplicity, let’s considert1=0 . The Euler necessary condition reads:

=0x(t)=x1+At

note that we can only assign the initial boundary conditionx(t1)=x1 . The transversality condition reads:

(t2)+p(x(t2)x2)=0A+p(x1+At2x2)=0A=p(x2x1)1+pt2.

Therefore the extremal trajectory is:

x(t)=x1+p(x2x1)1+pt2t,

note that asp we havex(t)x1+x2x1t2t that isx(t2)x2 in the limit of an infinite weightp . This would be the solution of the Lagrange problem with the same lagrangian andϕ0 and the endpointx(t2)=x2 fixed. On the other hand, for a vanishing weightp (i.e.p0 we have thatx(t)x1 tends to a constant value throughout the trajectory. Indeed a constant trajectory is the global minimum 5 for the functionalF[x]=t1t2122dt where only the initial point is fixed. The extremal trajectories for a finitep are straight lines in the interval[0,t2] that reach the optimal trade-off between minimizing the value of the derivative and reaching the target. A simple solution witht2=1 andx1=1 ,x2=2 is shown in Figure 3.7. In this simple case we can also compute analytically the optimal value of the cost functional as a function of the weightp . Since=p1+p andx(t2)=1+p1+p , we have:

F[x]=0112(p1+p)2dt+12p(1+p1+p2)2=p2(1+p)

pict

Figure 3.7:: Extremal trajectories for increasing weight p . Limiting case p and p0 shown in green and red respectively