Home Laboratory Computer Lab Theses

2.7 Problems with equality constraints

In this section, we shall consider nonlinear programming problems with equality constraints of the form:

minimize f(x) subject to hi(x)=0i=1,,nh

Based on the material presented in 2.6, it is tempting to convert this problem into an inequality constrained problem, by replacing each equality constraints hi(x)=0 by two inequality constraints hi+(x)=hi(x)0 and hi(x)=h(x)0 . Given a feasible point x¯nx , we would have hi+(x¯)=hi(x¯)=0 and hi+(x¯)=hi(x¯) . Therefore, there could exist no vector d such that hi+(x¯)<0 and hi(x¯)<0 simultaneously, i.e. 𝒟0(x¯)= . In other words, the geometric conditions developed in the previous section for inequality constrained problems are satisfied by all feasible solutions and, hence, are not informative. A different approach must therefore be used to deal with equality constrained problems. After a number of preliminary results in 2.7.1, we shall describe the method of Lagrange multipliers for equality constrained problems in 2.7.2.

2.7.1 Preliminaries

An equality constraint h(x)=0 defines a set on nx , which can be seen as a hypersurface.

When considering nh1 equality constraints {h1(x),,hnh(x)} , their intersection forms a (possibly empty) set S:={xnx:hi(x)=0,i=1,,nh} .

Throughout this section, we shall assume that the equality constraints are differentiable; that is, the set {S:=xnx:hi(x)=0,i=1,,nh} is said to be a differentiable manifold (or smooth manifold). Associated with a point on a differentiable manifold is the tangent set at that point. To formalize this notion, we start by defining curves on a manifold. A curve ξ on a manifold S is a continuous application ξ:S , i.e., a family of points ξ(t)S continuously parameterized by t in an interval . A curve is said to pass through the point x¯ if x¯=ξ(t¯) for some t¯ ; the derivative of a curve at t¯ , provided it exists, is defined as ξ˙(t¯):=limh0ξ(t¯+h)ξ(t¯)h . A curve is said to be differentiable (or smooth) if a derivative exists for each tI .

Definition 2.21: Tangent Set

Let S be a (differentiable) manifold in nx , and let x¯S . Consider the collection of all the continuously differentiable curves on S passing through x¯ . Then, the collection of all the vectors tangent to these curves at x¯ is said to be the tangent set to S at x¯ , denoted by 𝒯(x¯) .

If the constraints are regular, in the sense of Definition 2.22 below, then S is (locally) of dimension nxnh , and 𝒯(x¯) constitutes a subspace of dimension nxnh , called the tangent space.

Definition 2.22: Regular Point (for a Set of Equality Constraints)

Let hi:nx,i=1,,nh be differentiable functions on nx and consider the set S:={x¯:hi(x)=0,i=1,...,nh} . A point xS is said to be a regular point if the gradient vectors hi(x) , i=1,,nh are linearly independent, i.e.,

rank(h1(x¯)h2(x¯)hnh(x¯))=nh

Lemma 2.3: Algebraic Characterization of a Tangent Space

Let hi:nx,i=1,,nh be differentiable functions on nx and consider the set S:={x¯:hi(x)=0,i=1,...,nh} . At a regular point xS , the tangent space is such that

𝒯(x¯)={d:h(x¯)d=0}

Proof. ... □

Recall that h=hxnh×nx is the Jacobian of the constraints. Therefore, the tangent directions d must be orthogonal to the gradient of each constraint hi at x¯ . Note also that hxd=0 defines the kernel of the Jacobian at x¯ that is the set K(hx):={d:hxd=0} . Since we have assumed that x¯ is a regular point, the Jacobian is full rank and for the rank-nullity theorem we have that the dimension of the kernel is dim(K(hx))=nxnh that is the dimension of the tangent space.

2.7.2 The Method of Lagrange Multipliers

The idea behind the method of Lagrange multipliers for solving equality constrained NLP problems of the form

minimize f(x) subject to hi(x)=0i=1,,nh

is to restrict the search of a minimum on the manifold S:={xnx:hi(x)=0,i=1,...,nh} . In other words, we derive optimality conditions by considering the value of the objective function along curves on the manifold S passing through the optimal point.

The following Theorem shows that the tangent space 𝒯(x) at a regular (local) minimum point x is orthogonal to the gradient of the objective function at x . This important fact is illustrated in Fig. ??. in the case of a single equality constraint.

Theorem 2.12: Geometric Necessary Condition for a Local Minimum

Let f:nx and hi:nx,i=1,,nh be continuously differentiable functions on nx . Suppose that x is a local minimum point of the problem to minimize f(x) subject to the constraints h(x)=0 . Then, f(x) is orthogonal to the tangent space 𝒯(x) , that is:

0(x)𝒯(x)=

Proof. By contradiction, assume that there exists a d𝒯(x) such that f(x)d0 . Let ξ:=[a,a]S,a>0 be any smooth curve passing through x with ξ(0)=x and ξ˙(0)=d . Let also ϕ be the function defined as ϕ(t):=f(ξ(t)),tI . Since x is a local minimum of f on S:={x:hi(x)=0,i=1,...,nh} , by Definition 2.11, we have

η>0 such that φ(t)=f(ξ(t))f(x)=φ(0)tη(0)

It follows that t=0 is an unconstrained (local) minimum point for ϕ , and

0=φ(0)=f(x)ξ˙(0)=f(x)d

We thus get a contradiction with the fact that f(x)d0

Next, we take advantage of the forgoing geometric characterization, and derive first-order necessary conditions for equality constrained NLP problems.

Theorem 2.13: First-Order Necessary Conditions

Let f:nx and hi:nx,i=1,,nh be continuously differentiable functions on nx . Consider the problem to minimize f(x) subject to the constraints h(x)=0 . If x is a local minimum and is a regular point of the constraints, then there exists a unique vector λnh such that:

f(x)+h(x)λ=f(x)+i=1nhhi(x)λi=0

Proof. Since x is a local minimum of f on S:={xnx:h(x)=0} , by Theorem 2.12, we have 0(x)𝒯(x)= , i.e. the system

f(x)d<0h(x)d=0

is inconsistent. Consider the following two sets:

C1:={(z1,z2)nh+1:z1=f(x)d,z2=h(x)d}C2:={(z1,z2)nh+1:z1<0,z2=0.}

Clearly, C1 and C2 are convex and C1C2= . Then, by the separation Theorem 2.14, there exists a nonzero vector (μ,λ)nh+1 such that

μf(x)d+λ[h(x)d]μz1+λz2dnx,(z1,z2)cl(C2)

Letting z2=0 and since z1 can be made an arbitrarily large negative number, it follows that μ0 a a Note that ifμ<0 then sincez1<0 the lower bound on the left-hand side would be positive and arbitrarily large. Also, letting (z1,z2)=(0,0) b we must have [μf(x)+h(x)λ]d0 , for each dnx . In particular, letting d=[μf(x)+h(x)λ] , it follows that μf(x)+h(x)λ20 , and thus,

μf(x)+h(x)λ=0 with (μ,λ)(0,0)

Finally, note that μ>0 , for otherwise the above equation would contradict the assumption of linear independence of hi(x),i=1,,nh . The result follows by letting λ:=1μλ , and noting that the linear independence assumption implies the uniqueness of these Lagrangian multipliers. □

Theorem 2.14: Separation of Two Convex Sets

Let C1 and C2 be two nonempty, convex set in n and suppose that C1C2= . Then, there exists a hyperplane that separates C1 and C2 ; that is, there exists a nonzero vector pn such that

px1px2x1cl(C1)x2cl(C2)

Remark 2.22: Obtaining Candidate Solution Points

The first-order necessary conditions

f(x)+h(x)λ=0

together with the constraints

h(x)=0

give a total of nx+nh (typically nonlinear) equations in the variables (x,λ) . Hence, these conditions are complete in the sense that they determine, at least locally, a unique solution. However, as in the unconstrained case, a solution to the first-order necessary conditions need not be a (local) minimum of the original problem; it could very well correspond to a (local) maximum point, or some kind of saddle point. These consideration are illustrated in Example 2.15 below.

Remark 2.23: Regularity-Type Assumption

It is important to note that for a local minimum to satisfy the foregoing first-order conditions and, in particular, for a unique Lagrange multiplier vector to exist, it is necessary that the equality constraint satisfy a regularity condition. In other word, the first-order conditions may not hold at a local minimum point that is non-regular. An illustration of these considerations is provided in Example 2.16.

There exists a number of similarities with the constraint qualification needed for a local minimizer of an inequality constrained NLP problem to be KKT point; in particular, the condition that the minimum point be a regular point for the constraints corresponds to LICQ (see Remark 2.21).

Remark 2.24: Lagrangian

It is convenient to introduce the Lagrangian :nx×nh associated with the constrained problem, by adjoining the cost and constraint functions as:

(x,λ):=f(x)+λh(x)

Thus, if x is a local minimum which is regular, the first-order necessary conditions are written as

x(x,λ)=0λ(x,λ)=0

the latter equations being simply a restatement of the constraints. Note that the solution of the original problem typically corresponds to a saddle point of the Lagrangian function.

Example 2.15 (Regular Case). Consider the problem

minx2f(x):=x1+x2 s.t. g(x):=x12+x222=0

Observe first that every feasible point is a regular point for the equality constraint (the point (0,0) being infeasible). Therefore, every local minimum is a stationary point of the Lagrangian function by Theorem 2.13.

The gradient vectorsf(x) andh(x) are given by

f(x)=(11) and h(x)=(2x12x2)

so that the first-order necessary conditions read

2λx1=12λx2=1x12+x22=2

These three equations can be solved for the three unknownsx1 ,x2 andλ . Two candidate local minimum points are obtained: (i)x1=x2=1,λ=1 , and (ii)x1=x2=1,λ=1 . These results are illustrated on Fig. 2.8. It can be seen that only the former actually corresponds to a local minimum point, while the latter gives a local maximum point.

Example 2.16 (Non-Regular Case). Consider the problem

minx2f(x):=x1s.t.h1(x):=(1x1)3+x2=0h2(x):=(1x1)3x2=0

As shown by Fig. 1.13., this problem has only one feasible point, namely,x=(10) ; that is,x is also the unique global minimum of the problem. However, at this point, we have

f(x)=(10),h1(x)=(01) and h2(x)=(01)

hence the first-order conditions

λ1(01)+λ2(01)=(10)

cannot be satisfied. This illustrates the fact that a minimum point may not be a stationary point for the Lagrangian if that point is non-regular.

pict

Figure 2.8:: Solution of Example 2.15

The following theorem provides second-order necessary conditions for a point to be a local minimum of a NLP problem with equality constraints.

Theorem 2.15: Second-Order Necessary Conditions

Let f:nx and hi:nx,i=1,,nh be twice continuously differentiable functions on nx . Consider the problem to minimize f(x) subject to the constraints h(x)=0 . If x is a local minimum and is a regular point of the constraints, then there exists a unique vector λnh such that

f(x)+h(x)λ=0

and

d(2f(x)+i=1nh2hi(x)λi)d0d such that h(x)d=0

Proof. Note first that f(x)+h(x)λ=0 directly follows from Theorem 2.13. Let d be an arbitrary direction in 𝒯(x) ; that is, h(x)d=0 since x is a regular point (see Lemma 2.3). Consider any twice differentiable curve ξ:=[a,a]S , a>0 passing through x with ξ(0)=x and ξ˙(0)=d . Let ϕ be the function defined as ϕ(t):=f(ξ(t))t . Since x is a local minimum of f on S:={xnx:h(x)=0} , t=0 is an unconstrained (local) minimum point for ϕ . By Theorem 2.4, it follows that

02φ(0)aaNote that ϕ is a scalar function, as a consequence 2φ=d2ϕdt2 =ξ˙(0)2f(x)ξ˙(0)+f(x)ξ¨(0)

Furthermore, differentiating the relation h(ξ(t))λ=0 twice, we obtain

ξ˙(0)(i=1nh2hi(x)λi)ξ˙(0)+(h(x)λ)ξ¨(0)=0

Adding the last two equations yields

d(2f(x)+i=1nh2hi(x)λi)d0

and this condition must hold for every d such that h(x)d=0

It is useful to shed more light on the derivation of the previous proof. Thus, we carry out the derivative of ψ(t)=h(ξ(t))λ explicitly. Note that since the curve ξ is in S we have hi(ξ(t))=0i=1,,nht . Hence ψ(t) is identically zero for all t (i.e. ψ(t)0t ). We can expand ψ as ψ(t)=i=1nhhi(ξ(t))λi . The first total derivative is:

ψ˙=i=1nhj=1nxhixjξ˙jλi=[hλ]ξ˙=λhxξ˙=0

We can now differentiate again with respect to time to derive the second total derivative:

ψ¨=i=1nhj=1nxk=1nx2hixkxjξ˙kξ˙jλi+i=1nhj=1nxhixjξ¨jλi==i=1nhλiξ˙2hiξ˙+i=1nhλihiξ¨=0

Remember that in ξ(0)=x and ξ˙(0)=d and the first-order necessary condition hold f(x)=h(x)λ=i=1nhλihi(x) . Therefore, the second derivative of ψ in 0 gives:

ψ¨(0)=i=1nhλid2hi(x)df(x)ξ¨(0)=0

Hence:

f(x)ξ¨(0)=i=1nhλid2hi(x)d

Using the last equation, it is easier to follow the proof of Theorem 2.15

Remark 2.25: Eigenvalues in Tangent Space

In the foregoing Theorem, it is shown that the matrix xx2(x,λ) restricted to the subspace 𝒯(x) plays a key role. Geometrically, the restriction of xx2(x,λ) to 𝒯(x) corresponds to the projection
𝒫𝒯(x)[xx2(x,λ)] .

A vector y𝒯(x) is said to be an eigenvector of 𝒫𝒯(x)[xx2(x,λ)] if there is a real number μ such that:

𝒫𝒯(x)[xx2(x,λ)]y=μy

the corresponding μ is said to be an eigenvalue of 𝒫𝒯(x)[xx2(x,λ)] a a These definitions coincide with the usual definitions of eigenvector and eigenvalue for real matrices..

Now, to obtain a matrix representation for 𝒫𝒯(x)[xx2(x,λ)] , it is necessary to introduce a basis of the subspace 𝒯(x) , say
E=[e1,...,enxnh] b b Note thatEnx×(nxnh) . Then, the eigenvalues of 𝒫𝒯(x)[xx2(x,λ)] are the same as those of the nxnh×nxnh matrix Exx2(x,λ)E ; in particular, they are independent of the particular choice of basis E .

Example 2.17 (Regular Case Continued). Consider again the Example 2.15. Two candidate local minimum points, (i)x1=x2=1,λ=1 and (ii)x1=x2=1,λ=1 , were obtained on application of the first-order necessary conditions. The Hessian matrix of the Lagrangian function is given by

xx2(x,λ)=2f(x)+λ2h(x)=λ(2002)

and a basis of the tangent subspace at a pointx𝒯(x) is

E(x):=(x2x1)

Therefore,

Exx2(x,λ)E=2λ(x12+x22)

Since all the feasiblex1 andx2 must satisfy the constrainth=x12+x222=0 we have:

Exx2(x,λ)E=4λ

In particular, for the candidate solution point (i), we have

Exx2(x,λ)E=4>0

hence satisfying the second-order necessary conditions (in fact, this point also satisfies the second-order sufficient conditions of optimality discussed hereafter). On the other hand, for the candidate solution point (ii), we get

Exx2(x,λ)E=4<0

which does not satisfy the second-order necessary requirement, so this point cannot be a local minimum.

The conditions given in Theorems 2.13 and 2.15 are necessary conditions that must hold at each local minimum point. Yet, a point satisfying these conditions may not be a local minimum. The following theorem provides sufficient conditions for a stationary point of the Lagrangian function to be a (local) minimum, provided that the Hessian matrix of the Lagrangian function is locally convex along directions in the tangent space of the constraints.

Theorem 2.16: Second-Order Sufficient Conditions

Let f:nx and hi:nx,i=1,,nh be twice continuously differentiable functions on nx . Consider the problem to minimize f(x) subject to the constraints h(x)=0 . If x and λ satisfy

x(x,λ)=f(x)+i=1nhλihi(x)=0λ(x,λ)=h(x)=0

and

yxx2(x,λ)y>0y0 such that h(x)y=0

where (x,λ)=f(x)+λh(x) , then x is a strict local minimum.

Proof. Consider the augmented Lagrangian function

¯(x,λ)=f(x)+λh(x)+c2h(x)2

where c is a scalar. We have

x¯(x,λ)=x(x,λ¯)xx2¯(x,λ)=xx2(x,λ¯)+ch(x)h(x)

where λ¯=λ+ch(x) . Since (x,λ) satisfy the sufficient conditions and by Lemma 2.4, we obtain

x¯(x,λ)=0 and xx2¯(x,λ)0

for sufficiently large c . ¯ being definite positive at (x,λ) ,

ϱ>0,δ>0 such that ¯(x,λ)¯(x,λ)+ϱ2xx2 for xx<δ

Finally, since ¯(x,λ)=f(x) when h(x)=0 , we get

f(x)f(x)+ϱ2xx2 if h(x)=0,xx<δ

i.e., x is a strict local minimum. □

Example 2.18. Consider the problem

minx3f(x):=x1x2x1x3x2x3s.t.h(x):=x1+x2+x33=0

The first-order conditions for this problem are

(x2+x3)+λ=0(x1+x3)+λ=0(x1+x2)+λ=0

together with the equality constraint. It is easily checked that the pointx1=x2=x3=1 ,λ=2 satisfies these conditions. Moreover,

xx2(x,λ)=2f(x)=(011101110)

and a basis of the tangent space to the constrainth(x)=0 atx is

E:=(021111)

We thus obtain

Exx2(x,λ)E=(2002)

which is clearly a definite positive matrix. Hence,x is a strict local minimum of (1.18). (Interestingly enough, the Hessian matrix of the objective function itself is indefinite atx in this case.)

We close this section by providing insight into the Lagrange multipliers.

Remark 2.26: A first Interpretation of the Lagrange Multipliers

The concept of Lagrange multipliers allows to adjoin the constraints to the objective function. That is, one can view constrained optimization as a search for a vector x at which the gradient of the objective function is a linear combination of the gradients of constraints.

Another insightful interpretation of the Lagrange multipliers is as follows. Consider the set of perturbed problems v(y):=min{f(x):h(x)=y} . Suppose that there is a unique regular solution point for each y , and let ξ(y):=arg min{f(x):h(x)=y} denote the evolution of the optimal solution point as a function of the perturbation parameter y . Clearly,

v(0)=f(x) and ξ(0)=x

Moreover, since h(ξ(y))=y for each y , we have:

yh(ξ(y))=dhdy=1=i=1nxhxjdξjdy=hdξdy

Denoting by λ the Lagrange multiplier associated to the constraint h(x)=0 in the original problem, we have

dvdy|y=0=f(x)dξ(0)dy=λh(x)dξ(0)dy=λ

Therefore, the Lagrange multipliers λ can be interpreted as the sensitivity of the objective function f with respect to the constraint h . Said differently, λ indicates how much the optimal cost would change, if the constraints were perturbed. This interpretation extends straightforwardly to NLP problems having inequality constraints. The Lagrange multipliers of an active constraints g(x)0 , say ν>0 , can be interpreted as the sensitivity of f(x) with respect to a change in that constraints, as g(x)y ; in this case, the positivity of the Lagrange multipliers follows from the fact that by increasing y , the feasible region is relaxed, hence the optimal cost cannot increase. Regarding inactive constraints, the sensitivity interpretation also explains why the Lagrange multipliers are zero, as any infinitesimal change in the value of these constraints leaves the optimal cost value unchanged.

Remark 2.27: A second Interpretation of the Lagrange Multipliers: Mechanical Analogy

It is possible to introduce an interesting physical interpretation of the KKT condition and its Lagrange multipliers. Consider a ball on a hilly terrain where the elevation of the terrain is described by the function h=f(x) where x=[x1x2] are the x and y coordinates. The gravitational potential is V=mgf(x) , we use normalized units so that V=f(x) . In the unconstrained case the equilibrium point is a stationary point for the potential that is V(x)=0 . The generalized force, due to gravity, acting on the ball in a generic point x is F=V(x)=f(x) . Note that the motion along the vertical axis is completely determined by x (i.e. the system has two degrees of freedom). An equality constraint can be seen as a rail on which the ball is forced to slide. The rail is modeled as a curve in the form h(x)=0 . Such a constraint exert a generalized reaction force at each point normal to the curve, hence in the direction of h(x) . The magnitude of this force depends on how much the gravitational force is pushing against the constraint. The static equilibrium condition is f(x)=λh(x) . On the other hand, inequality constraints can be seen as barriers, the ball is forced to lie inside these barriers. Consider the inequality constraint g(x)0 . When the constraint is inactive, it is not exerting any force on the ball and hence it is not affecting the equilibrium equation. When the constraint is active, the direction of the force exerted is g(x) .Compared to rails, barriers are unilateral constraints and can only exert forces in one sense. This is the physical interpretation of the nonnegativity of its associated Lagrangian multipliers ν0 . The equilibrium equation in presence of both equality and inequality constraints become the first KKT condition. This interpretation is illustrated in Figure 2.9. Gravitational and reaction forces are F=f(x) , Rg=νg(x) , Rh=λh(x) . The force balance is therefore the KKT stationary condition with reversed sign : F+Rg+Rh=0

pict

Figure 2.9:: Mechanical Analogy: The stationarity condition correspond to the force balance F+Rg+Rh=0 (with reversed sign)