In this section, we will derive a set of necessary optimality conditions that a trajectory must satisfy in order to be a candidate solution for a weak local minimum of a cost functional . Note that, if we choose that is we work with once continuously differentiable functions, every strong extremum 2 2 What we mean by ”extremum” will be made clear later on is automatically a weak extremum, as a consequence the set of necessary conditions valid for weak extrema are necessary also for strong extrema. We start considering the basic problem of calculus of variations according to Definition 3.1 in which the admissible set is:
that is we are looking for a minimizing trajectory in the set of once continuously differentiable vector-valued functions that have given values at the endpoints. Recall that this is a free problem of the calculus of variations.
Definition 3.4: Perturbed functions
Given a function we define a family of perturbed functions as:
where is a small real parameter and are functions such that .
Remark 3.3
Note that by choosing sufficiently small we can make and arbitrarily close in the sense of the . We have that:
A pictorial example of perturbed function for the scalar case is given in Figure 3.4
We now give a fundamental definition that can be seen as the generalization of the directional derivative to infinite-dimensional function spaces.
Definition 3.5: First Variation of a Functional, Gâteaux Derivative
Let F be a functional defined on a function space . Then, the first variation of at in the direction also called Gâteaux derivative with respect to at , is defined as:
(provided it exists). If the limit exists for all , then is said to be Gâteaux differentiable at
Remark 3.4
Note that for fixed and the functional is actually a real-valued function of that is . Then the Gâteaux derivative is simply the first derivative of the function with respect to computed in , i.e.
Remark 3.5
Note that the Gâteaux derivative need not exist in any direction , or it may exist is some directions and not in others. Its existence presupposes that:
Then:
is defined only if this ”ordinary” derivative with respect to the real variable exists at . If the Gâteaux derivative exists, then it is unique.
Example 3.6 (Calculation of a Gâteaux Derivative). Consider the functional . In order to take the Gâteaux derivative we can define and then directly differentiate it. That is:
on the other hand, we can directly take the limit:
Example 3.7 (Non-Existence of a Gâteaux Derivative). Consider the functional . Clearly, is defined on all for each continuous function results in a continuous integrand whose integral is finite. For and we have:
therefore:
and a Gâteaux derivative does not exist at in the direction .
Lemma 3.1: Descent Direction
Let be a functional defined on a normed linear space (e.g a function space). Suppose that has a strictly negative variation at a point in some direction . Then, cannot be a local minimum point for (in the sense of the norm )
Proof. Since
hence,
since as the points are eventually in each neighborhood of , irrespective of the norm considered on . Thus, local minimum behavior of , in the sense of Definition 3.3 is not possible in the direction at . □
Definition 3.6: -Admissible Directions
Let be a functional defined on a subset of a linear space and let . Then, a direction , is said to be -admissible at for if:
Remark 3.6
Note that if is a -admissible direction then also the parametrized family of directions for all is a -admissible direction. This simple fact follows directly from the definition of -admissible directions.
Remark 3.7: Properties of First Varations
It is instrumental to highlight some properties of the first variation of a functional:
Linearity. Let and be two functionals defined on and a -admissible direction. Consider a linear combination of the two functionals a a Indeed is still a functional. Then . This linearity property is inherited by the linearity property of the ordinary derivative.
Homogeneity. Let be a functional defined on and a family of -admissible directions, then . In other words, the first variation is a homogeneous operator.
Note also that in general the first variation is not a linear operator in that is for and -admissible directions although it may be true in some special cases. We will require linear Gâteaux derivatives to prove some results of inequality constrained problems.
Theorem 3.1: Geometric Necessary Conditions for a Local Minimum
Let be a functional defined on a subset of a normed linear space . Suppose that is a local minimum point for on . Then:
Proof. By contradiction, suppose that there exists a -admissible direction such that . Then, by Lemma 3.1, cannot be a local minimum for . Likewise, there cannot be a -admissible direction such that . Indeed, being -admissible and by Lemma 3.1, cannot be a local minimum. Overall, we must have that for each -admissible direction at . □
Remark 3.8
Note that Theorem 3.1 can be interpreted as the usual necessary condition for optimality for unconstrained NLP problems. Let’s define . Note that the scalar function is the value of the functional when perturbed around a minimizing trajectory that is we are assuming that is a local minimum. Equivalently, we are stating that is a local minimum for . Therefore a necessary condition for to be a local minimum is . But for the definition of we also have:
where the last statement must hold for all admissible directions .
We now state some important results needed for the derivations of the necessary conditions of optimality.
Lemma 3.2
Let and Suppose that:
then
Proof. By contradiction suppose there is a such that , say a a the same reasoning for is analogous and is left to the reader as an exercise. By continuity of there exist a ball around such that is always positive, formally . Since is arbitrary we can choose a function that is everywhere but in where we choose it to be a positive continuous and differentiable function then, for such a , we have:
since both and are positive for and thus we reach a contradiction. Hence, . □
Remark 3.9
Note that with simple modifications Lemma 3.2 to the case:
Where and . We recover the case proved by the Lemma 3.2 by rewriting the previous equation as:
Then it is sufficient to consider functions sucht that and apply the results of Lemma 3.2 for each equation . In this way we obtain that the vector-valued function for each .
We state another standard instrumental theorem without proof.
Theorem 3.2: Differentiation Under the Integral Sign
Let such that be a continuous function with continuous partial derivative on Then
is in with the derivative
Remark 3.9 and Theorem 3.2 are used to obtain a set of necessary conditions for optimality known as Euler equations.
Theorem 3.3: Euler’s Necessary Conditions for Optimality
Consider the problem to minimize the functional:
where the functional space is defined as and a continuously differentiable function. Suppose that gives a (local) minimum for on . Then satisfies the Euler equations:
Proof. We start by computing the first variation of the functional at the minimizing trajectory in a -admissible direction :
By Theorem 3.1 we must have:
By Definition 3.6, -admissible directions are functions such that . Note that in this way we have . We modify the expression of the first variation in order to apply Lemma 3.2. Using integration by parts we have:
but since is a -admissible direction we have:
The necessary condition for optimality then becomes:
The last equation satsifies the conditions of Lemma 3.2, therefore we must have:
Definition 3.7: Stationary Function
Each function that satisfies the Euler equations on some interval is called a stationary function for the Lagrangian .
Remark 3.10
Note that the Euler equations are necessary conditions for optimality therefore a stationary point can be a minimum, a maximum or none of them. In the literature stationary points are alse called extrema or extremal trajectories.
Example 3.8 (Simplest calculus of variations problem). Given two points in the plane and , find the curve of minimum length that joins them. A schematic picture is drawn in Figure 3.5. Obviously, the solution to this problem is a line joining and but we will work out the solution using the Euler equation. The functional that we want to minimize is the length of the curve, that is we want to solve the problem:
note that with respect to the previous notation is the independent variable and we are seeking for a function . We have also indicated . The lagrangian for this problem is . Note that it is independent of . The Euler equation for this problem reads:
As a consequence:
Since the denominator is always different from zero we have , therefore stationary functions have zero second-derivative. We can then integrate this condition to derive :
Finally imposing that , that is imposing the boundary conditions and we obtain:
Remark 3.11
Depending on the structure of the lagrangian function , Euler equations can be simplified.
f does not explicitly depend on the function .
Suppose , then the Euler equations becomes .
f does not explicitly depend on the function .
Suppose then the Euler equations becomes . This is a degenerate case.
f does not explicitly depend on the independent variable .
Suppose , we define the Hamiltonian function . If is a stationary point then the Hamiltonian is constant. The total time-derivative of the Hamiltonian is:
Therefore an equivalent necessary condition is . Where C is a constant. This is also known as Beltrami Identity.
Example 3.9 (A more challenging problem: The Brachistochrone). Recall the Brachistochrone Problem 3.1:
For the sake of simplicity, we consider a slightly simplified problem. Assume that the initial velocity and the initial point is the origin, that is . The lagrangian for this problem is . Note that the lagrangian is independent of , hence the Hamiltonian is stationary. Therefore, we have:
Which is equivalent to:
for a new constant . Thus we have:
hat is equivalent to:
In order to solve the previous integral we make the substitution , therefore The integral becomes:
where is a constant to be determined. Overall we have obtained a curve in the parameter that is:
This is the parametric form of a family of cycloids depending on the constants (that in turn depends on ) and . The first boundary condition allows to easily find in terms of . That is:
Thus, we obtain a smaller family of cycloids passing through the origin as shown in Figure 3.6whose equations are:
note that these are alle minimum time curves. The second boundary condition is somewhat more involved and is solved numerically, the nonlinear system is:
The solution joining points and is shown in red in Figure 3.6. The constant .
Remark 3.12: Hamilton’s Interpretation of Lagrangian Mechanics
According to the principle of least action, a system in motion will always follow the trajectory that minimize the action functional:
where are the generalized lagrangian coordinates of an degrees of freedom system. is the kinetic energy while is the potential energy. The Euler necessary condition reads:
which is the usual Lagrange equation of analytical mechanics. In view of this mechanical interpretation recall the definition of Hamiltonian . In general the kinetic energy have the form . In this case the Hamiltonian reads:
where is the total energy of the system. According to Remark 3.11 if the lagrangian is independent of time the Hamiltonian is constant along an extremal trajectory. In other words the total energy of the system is conserved. Finally, recall that the momentum of a mechanical system can be defined as , hence the special case when the lagrangian is independent of in Remark 3.11 can be viewed as a momentum conservation that is .
Analogously to NLP problems, also for problems of the calculus of variations we can derive a second-order necessary condition. Recall that in the finite-dimensional NLP case the second-order necessary condition was the semi-positive definiteness of the Hessian matrix that is equivalent to the objective function being locally convex in the neighbourhood of a stationary point. Similar conditions hold in the infinite-dimensional case. First, we need the following definition:
Definition 3.8: Second Variation of a Functional
Let F be a functional defined on a function space . Then, the second variation of at in the direction is defined as:
(provided it exists).
Theorem 3.4: Second-Order Necessary Conditions (Legendre)
Consider the problem to minimize the functional:
where the functional space is defined as such that and a continuously differentiable function. Suppose that gives a (local) minimum for on . Then satisfies the Euler equation along with the so-called Legendre condition:
Proof. Based on the differentiability properties of Theorem 3.2 we can compute the second variation as:
where we have used the compressed notation . Substituing gives:
Define the real-valued function of , . We have that satisfies the necessary conditions for a local minimum of an unconstrained optimization problem, since is a local minimum for the functional . Therefore and . Note that . Hence we have:
The last equation must hold true for each perturbation such that . We use Einstein’s notation to imply the double summation, that is:
to simplify the notation let’s define . Note that due to Schwarz’s theorem is symmetric. Using integration by parts we have:
then, using the symmetry of A and the boundary conditions on , we have:
hence we have:
The necessary condition on the second variation becomes:
by Lemma 3.3 a necessary condition on to be nonnegative is:
Lemma 3.3
Let and be given continuous symmetric matrix functions on and let the quadratic functional:
be defined for all such that . Then, a necessary condition for the quadratic functional to be nonnegative for all such is that for each .
It would be tempting to extend the analogy with NLP problems and looking for a sufficient condition. Recall that a sufficient condition for a local minimum of an unconstrained NLP was the stationarity and the strict convexity of the Hessian at the candidate solution, see Theorem 2.8. Unfortunately, the requirements that shall be a stationary function for and the lagrangian 4 strictly convex with respect to the third argument are not sufficient for to be a (weak) local minimum of a free problem of the the calculus of variations. Additional conditions must hold, such as the absence of points conjugate to the point in , such condition is called Jacobi’s sufficient condition. We remind the reader to the vast literature about this topic. Instead we give a sufficient condition for a global minimum that relies on the lagrangian being jointly convex in the second and third argument , respectively. Unfortunately this condition is seldom satisfied in practice. Recall that for a continuously differentiable function , joint convexity means:
Theorem 3.5: Sufficient Conditions for a Weak Global Minimum
Consider the problem to minimize the functional:
where the functional space is defined as such that and a continuously differentiable function. Suppose that the Lagrangian is [strictly] jointly convex in . If is a stationary function for the Lagrangian a a It satifies the Euler necessary condtions then is also a [strict] global minimizer for on .
Proof.
where we have used joint convexity of in the second and third argument. Subsequenty, we make use of integration by parts and the fact that and are equal at the boundary, that is and together with the hypothesis that is a stationary point. Overall we have:
which is the definition of global minimum of a functional. □
Example 3.10. Consider the problem P to minimize the functional
on . The lagrangian is convex in and does not depend on , then thanks to Theorem 3.5every stationary point is a global minimum. The Euler necessary condition for this case reduces to:
therefore by double integration and imposing the boundary condition we obtain:
that is a straight line joining the origin and the point . The extremal function is the unique global minimum thanks to Theorem 3.5since it is the unique stationary point of the functional to be minimized.
We now consider slightly different and more involved problems. Consider the problem in the following definition:
Definition 3.9: Calculus of Variation Free End-point problem
Consider the Calculus of Variations problem with free end-point and end-time:
Remark 3.13
The cost functional in Definition 3.9 represent a Bolza problem. Note that the neigher the end-time at nor the end-point are specified and our search space of functions is in some sense larger. Our search space is defined as for sufficiently large so that it can include . We need to derive additional conditions that an extremal must satisfy. These conditions should allow us to find an equation for the end-point and the end-time at which this end-point is reached. Informally, we are then looking for additional equations to derive. equations are needed for while one equation for the final time .
We give without proof an important theorem that is instrumental to derive optimality conditions for this problem.
Theorem 3.6: of Leibniz
Let such that be a continuous function with continuous partial derivative on and . Then
Theorem 3.7: Necessary conditions for free end-point and end-time problems
Consider the problem in Definition 3.9. If is a local minimum point then must satisfy:
Proof. Recall that the geometric necessary condition in Theorem 3.1 does not assume any specific form for the functional . Hence, a necessary condition for a local minimum of a free end-point problem is again for all -admissible direction . Note that in order to be a -admissible direction for this problem must be such that . We highlight again the fact that is not given and that must not satisfy any boundary condition at . Since is not fixed we need to allow variations of the end-time around the optimal time , thus we consider end-time variations of the form . It is essential to notice that the modulating parameter is the same for variation of the end-time and of the trajectory . With this assumption we do not loose generality since and are arbitrary. The first variation of the Bolza-type functional is:
we carry out the derivative term by term. Note that in the first term we can apply Theorem 3.6. Thus we have:
again we have used to compressed notation . Using integration by parts and the boundary condition we have:
while the second term in the expression of the first variation is:
the geometric necessary condition for optimality is therefore:
since the necessary condition must hold along any -admissible direction and final endpoint variation , the underlying rationale is to choose variations that allow us to derive a set of necessary conditions in the form of differential equations or boundary conditions. First we consider a family of variations such that and such that also at the second endpoint holds . Note that these variations are indeed -admissible. Therefore we have:
Note that the previous equation is exactly the equation from which we derived the Euler necessary condition. We conclude that Euler equation is a necessary condition and must hold at a stationary point also in this case.
Note that is yet to be determined. Now we consider a second family of -admissible directions such that but is free to vary. Since Euler equation is satisfied we have:
we can conclude that:
this is a set of boundary conditions at that implicitely defines . These conditions are called transversality conditions. Finally we use a family of variations that nullifies the term , hence variations that satisfies while is left free to vary. Again, Euler equation is satisfied and thus we have:
Therefore we have:
That is a scalar equation that implicitely defines . Note that the first two terms are precisely the definition of the Hamiltonian evaluated at the final optimal instant. The condition can be rewritten as
Remark 3.14
A similar reasoning holds for more general Bolza problems where the function depends also on the initial time, that is . In this case, neither the initial point nor the initial time are fixed and thence the -admissible directions need not to satisfy . In this case, the transversality condition and the end-point condition of the Hamiltonian must hold also at the initial time.
Remark 3.15
The free end-point problem of Lagrange ( i.e. when ) can be handled in the same way, applying the necessary conditions of Theorem 3.7 yields the so called natural boundary conditions:
these boundary conditions must hold at a stationary point together with the Euler equations. Note that if is fixed while is left free only the second equation must hold and vice versa.
Example 3.11. Consider the problem to minimize the functional:
in which the final time is fixed but not the endpoint . is a scalar weight. In this case is a strongly convex nonnegative function that penalizes the distance of the final endpoint from the given target . We highlight again the fact that is not given and it must be found using the transversality condition. The term under the integral sign is a strongly convex function that penalizes high value of the derivative of . Informally, given an initial point we want to find the curve that reaches a trade-off between having a small derivative along its trajectory and approaching as close as possible the target . Obviously the relative weight of these objectives depend on the magnitude of . For the sake of simplicity, let’s consider . The Euler necessary condition reads:
note that we can only assign the initial boundary condition . The transversality condition reads:
Therefore the extremal trajectory is:
note that as we have that is in the limit of an infinite weight . This would be the solution of the Lagrange problem with the same lagrangian and and the endpoint fixed. On the other hand, for a vanishing weight (i.e. we have that tends to a constant value throughout the trajectory. Indeed a constant trajectory is the global minimum 5 for the functional where only the initial point is fixed. The extremal trajectories for a finite are straight lines in the interval that reach the optimal trade-off between minimizing the value of the derivative and reaching the target. A simple solution with and , is shown in Figure 3.7. In this simple case we can also compute analytically the optimal value of the cost functional as a function of the weight . Since and , we have: