In this section, we shall consider nonlinear programming problems with equality constraints of the form:
Based on the material presented in 2.6, it is tempting to convert this problem into an inequality constrained problem, by replacing each equality constraints by two inequality constraints and . Given a feasible point , we would have and . Therefore, there could exist no vector such that and simultaneously, i.e. . In other words, the geometric conditions developed in the previous section for inequality constrained problems are satisfied by all feasible solutions and, hence, are not informative. A different approach must therefore be used to deal with equality constrained problems. After a number of preliminary results in 2.7.1, we shall describe the method of Lagrange multipliers for equality constrained problems in 2.7.2.
An equality constraint defines a set on , which can be seen as a hypersurface.
When considering equality constraints , their intersection forms a (possibly empty) set .
Throughout this section, we shall assume that the equality constraints are differentiable; that is, the set is said to be a differentiable manifold (or smooth manifold). Associated with a point on a differentiable manifold is the tangent set at that point. To formalize this notion, we start by defining curves on a manifold. A curve on a manifold is a continuous application , i.e., a family of points continuously parameterized by in an interval . A curve is said to pass through the point if for some ; the derivative of a curve at , provided it exists, is defined as . A curve is said to be differentiable (or smooth) if a derivative exists for each .
Definition 2.21: Tangent Set
Let be a (differentiable) manifold in , and let . Consider the collection of all the continuously differentiable curves on passing through . Then, the collection of all the vectors tangent to these curves at is said to be the tangent set to at , denoted by .
If the constraints are regular, in the sense of Definition 2.22 below, then is (locally) of dimension , and constitutes a subspace of dimension , called the tangent space.
Definition 2.22: Regular Point (for a Set of Equality Constraints)
Let be differentiable functions on and consider the set . A point is said to be a regular point if the gradient vectors , are linearly independent, i.e.,
Lemma 2.3: Algebraic Characterization of a Tangent Space
Let be differentiable functions on and consider the set . At a regular point , the tangent space is such that
Proof. ... □
Recall that is the Jacobian of the constraints. Therefore, the tangent directions must be orthogonal to the gradient of each constraint at . Note also that defines the kernel of the Jacobian at that is the set . Since we have assumed that is a regular point, the Jacobian is full rank and for the rank-nullity theorem we have that the dimension of the kernel is that is the dimension of the tangent space.
The idea behind the method of Lagrange multipliers for solving equality constrained NLP problems of the form
is to restrict the search of a minimum on the manifold . In other words, we derive optimality conditions by considering the value of the objective function along curves on the manifold S passing through the optimal point.
The following Theorem shows that the tangent space at a regular (local) minimum point is orthogonal to the gradient of the objective function at . This important fact is illustrated in Fig. ??. in the case of a single equality constraint.
Theorem 2.12: Geometric Necessary Condition for a Local Minimum
Let and be continuously differentiable functions on . Suppose that is a local minimum point of the problem to minimize subject to the constraints . Then, is orthogonal to the tangent space , that is:
Proof. By contradiction, assume that there exists a such that . Let be any smooth curve passing through with and . Let also be the function defined as . Since is a local minimum of f on , by Definition 2.11, we have
It follows that is an unconstrained (local) minimum point for , and
We thus get a contradiction with the fact that □
Next, we take advantage of the forgoing geometric characterization, and derive first-order necessary conditions for equality constrained NLP problems.
Theorem 2.13: First-Order Necessary Conditions
Let and be continuously differentiable functions on . Consider the problem to minimize subject to the constraints . If is a local minimum and is a regular point of the constraints, then there exists a unique vector such that:
Proof. Since is a local minimum of on , by Theorem 2.12, we have , i.e. the system
is inconsistent. Consider the following two sets:
Clearly, and are convex and . Then, by the separation Theorem 2.14, there exists a nonzero vector such that
Letting and since can be made an arbitrarily large negative number, it follows that a a Note that if then since the lower bound on the left-hand side would be positive and arbitrarily large. Also, letting b we must have , for each . In particular, letting , it follows that , and thus,
Finally, note that , for otherwise the above equation would contradict the assumption of linear independence of . The result follows by letting , and noting that the linear independence assumption implies the uniqueness of these Lagrangian multipliers. □
Theorem 2.14: Separation of Two Convex Sets
Let and be two nonempty, convex set in and suppose that . Then, there exists a hyperplane that separates and ; that is, there exists a nonzero vector such that
Remark 2.22: Obtaining Candidate Solution Points
The first-order necessary conditions
together with the constraints
give a total of (typically nonlinear) equations in the variables . Hence, these conditions are complete in the sense that they determine, at least locally, a unique solution. However, as in the unconstrained case, a solution to the first-order necessary conditions need not be a (local) minimum of the original problem; it could very well correspond to a (local) maximum point, or some kind of saddle point. These consideration are illustrated in Example 2.15 below.
Remark 2.23: Regularity-Type Assumption
It is important to note that for a local minimum to satisfy the foregoing first-order conditions and, in particular, for a unique Lagrange multiplier vector to exist, it is necessary that the equality constraint satisfy a regularity condition. In other word, the first-order conditions may not hold at a local minimum point that is non-regular. An illustration of these considerations is provided in Example 2.16.
There exists a number of similarities with the constraint qualification needed for a local minimizer of an inequality constrained NLP problem to be KKT point; in particular, the condition that the minimum point be a regular point for the constraints corresponds to LICQ (see Remark 2.21).
Remark 2.24: Lagrangian
It is convenient to introduce the Lagrangian associated with the constrained problem, by adjoining the cost and constraint functions as:
Thus, if is a local minimum which is regular, the first-order necessary conditions are written as
the latter equations being simply a restatement of the constraints. Note that the solution of the original problem typically corresponds to a saddle point of the Lagrangian function.
Example 2.15 (Regular Case). Consider the problem
Observe first that every feasible point is a regular point for the equality constraint (the point (0,0) being infeasible). Therefore, every local minimum is a stationary point of the Lagrangian function by Theorem 2.13.
The gradient vectors and are given by
so that the first-order necessary conditions read
These three equations can be solved for the three unknowns , and . Two candidate local minimum points are obtained: (i) , and (ii) . These results are illustrated on Fig. 2.8. It can be seen that only the former actually corresponds to a local minimum point, while the latter gives a local maximum point.
Example 2.16 (Non-Regular Case). Consider the problem
As shown by Fig. 1.13., this problem has only one feasible point, namely, ; that is, is also the unique global minimum of the problem. However, at this point, we have
hence the first-order conditions
cannot be satisfied. This illustrates the fact that a minimum point may not be a stationary point for the Lagrangian if that point is non-regular.
The following theorem provides second-order necessary conditions for a point to be a local minimum of a NLP problem with equality constraints.
Theorem 2.15: Second-Order Necessary Conditions
Let and be twice continuously differentiable functions on . Consider the problem to minimize subject to the constraints . If is a local minimum and is a regular point of the constraints, then there exists a unique vector such that
and
Proof. Note first that directly follows from Theorem 2.13. Let be an arbitrary direction in ; that is, since is a regular point (see Lemma 2.3). Consider any twice differentiable curve , passing through with and . Let be the function defined as . Since is a local minimum of on , is an unconstrained (local) minimum point for . By Theorem 2.4, it follows that
Furthermore, differentiating the relation twice, we obtain
Adding the last two equations yields
and this condition must hold for every such that □
It is useful to shed more light on the derivation of the previous proof. Thus, we carry out the derivative of explicitly. Note that since the curve is in we have . Hence is identically zero for all (i.e. ). We can expand as . The first total derivative is:
We can now differentiate again with respect to time to derive the second total derivative:
Remember that in and and the first-order necessary condition hold . Therefore, the second derivative of in gives:
Hence:
Using the last equation, it is easier to follow the proof of Theorem 2.15
Remark 2.25: Eigenvalues in Tangent Space
In the foregoing Theorem, it is shown that the matrix restricted to the subspace plays a key role. Geometrically, the restriction of to corresponds to the projection
.
A vector is said to be an eigenvector of if there is a real number such that:
the corresponding is said to be an eigenvalue of a a These definitions coincide with the usual definitions of eigenvector and eigenvalue for real matrices..
Now, to obtain a matrix representation for , it is necessary to introduce a basis of the subspace , say
b b Note that . Then, the eigenvalues of are the same as those of the matrix ; in particular, they are independent of the particular choice of basis .
Example 2.17 (Regular Case Continued). Consider again the Example 2.15. Two candidate local minimum points, (i) and (ii) , were obtained on application of the first-order necessary conditions. The Hessian matrix of the Lagrangian function is given by
and a basis of the tangent subspace at a point is
Therefore,
Since all the feasible and must satisfy the constraint we have:
In particular, for the candidate solution point (i), we have
hence satisfying the second-order necessary conditions (in fact, this point also satisfies the second-order sufficient conditions of optimality discussed hereafter). On the other hand, for the candidate solution point (ii), we get
which does not satisfy the second-order necessary requirement, so this point cannot be a local minimum.
The conditions given in Theorems 2.13 and 2.15 are necessary conditions that must hold at each local minimum point. Yet, a point satisfying these conditions may not be a local minimum. The following theorem provides sufficient conditions for a stationary point of the Lagrangian function to be a (local) minimum, provided that the Hessian matrix of the Lagrangian function is locally convex along directions in the tangent space of the constraints.
Theorem 2.16: Second-Order Sufficient Conditions
Let and be twice continuously differentiable functions on . Consider the problem to minimize subject to the constraints . If and satisfy
and
where , then is a strict local minimum.
Proof. Consider the augmented Lagrangian function
where is a scalar. We have
where . Since satisfy the sufficient conditions and by Lemma 2.4, we obtain
for sufficiently large . being definite positive at ,
Finally, since when , we get
i.e., is a strict local minimum. □
Example 2.18. Consider the problem
The first-order conditions for this problem are
together with the equality constraint. It is easily checked that the point , satisfies these conditions. Moreover,
and a basis of the tangent space to the constraint at is
We thus obtain
which is clearly a definite positive matrix. Hence, is a strict local minimum of (1.18). (Interestingly enough, the Hessian matrix of the objective function itself is indefinite at in this case.)
We close this section by providing insight into the Lagrange multipliers.
Remark 2.26: A first Interpretation of the Lagrange Multipliers
The concept of Lagrange multipliers allows to adjoin the constraints to the objective function. That is, one can view constrained optimization as a search for a vector at which the gradient of the objective function is a linear combination of the gradients of constraints.
Another insightful interpretation of the Lagrange multipliers is as follows. Consider the set of perturbed problems . Suppose that there is a unique regular solution point for each , and let denote the evolution of the optimal solution point as a function of the perturbation parameter . Clearly,
Moreover, since for each , we have:
Denoting by the Lagrange multiplier associated to the constraint in the original problem, we have
Therefore, the Lagrange multipliers can be interpreted as the sensitivity of the objective function with respect to the constraint . Said differently, indicates how much the optimal cost would change, if the constraints were perturbed. This interpretation extends straightforwardly to NLP problems having inequality constraints. The Lagrange multipliers of an active constraints , say , can be interpreted as the sensitivity of with respect to a change in that constraints, as ; in this case, the positivity of the Lagrange multipliers follows from the fact that by increasing , the feasible region is relaxed, hence the optimal cost cannot increase. Regarding inactive constraints, the sensitivity interpretation also explains why the Lagrange multipliers are zero, as any infinitesimal change in the value of these constraints leaves the optimal cost value unchanged.
Remark 2.27: A second Interpretation of the Lagrange Multipliers: Mechanical Analogy
It is possible to introduce an interesting physical interpretation of the KKT condition and its Lagrange multipliers. Consider a ball on a hilly terrain where the elevation of the terrain is described by the function where are the and coordinates. The gravitational potential is , we use normalized units so that . In the unconstrained case the equilibrium point is a stationary point for the potential that is . The generalized force, due to gravity, acting on the ball in a generic point is . Note that the motion along the vertical axis is completely determined by (i.e. the system has two degrees of freedom). An equality constraint can be seen as a rail on which the ball is forced to slide. The rail is modeled as a curve in the form . Such a constraint exert a generalized reaction force at each point normal to the curve, hence in the direction of . The magnitude of this force depends on how much the gravitational force is pushing against the constraint. The static equilibrium condition is . On the other hand, inequality constraints can be seen as barriers, the ball is forced to lie inside these barriers. Consider the inequality constraint . When the constraint is inactive, it is not exerting any force on the ball and hence it is not affecting the equilibrium equation. When the constraint is active, the direction of the force exerted is .Compared to rails, barriers are unilateral constraints and can only exert forces in one sense. This is the physical interpretation of the nonnegativity of its associated Lagrangian multipliers . The equilibrium equation in presence of both equality and inequality constraints become the first KKT condition. This interpretation is illustrated in Figure 2.9. Gravitational and reaction forces are , , . The force balance is therefore the KKT stationary condition with reversed sign :