If we want to minise a function f(x) subject to the constraints g(x)=c, one uses the method of lagrange multipliers. The idea is to consider a new function L(x,λ)=f(x)+λ(c−g(x)). Now, if one has a local maxima (x⋆,y⋆), then the conditions:
∂x∂L=0: f′(x⋆)−λg′(x⋆)=0.
∂λ∂L=0: g(x⋆)=c.
Equation (2) is sensible: we want our optima to satisfy the constraint that we had originally imposed. What is Equation (1) trying to say? Geometrically, it's asking us to keep f′(x⋆) parallel to g′(x⋆). Why is this a good ask? Let us say that we are at an (x0) which is a feasible point ( g(x0)=c). We are interested in wiggling (x0)wiggle(x0+ϵ)≡x1.
x1 is still feasible: g(x1)=c=g(x0).
x1 is an improvement: f(x1)>f(x0).
If we want g(x1) to not change, then we need g′(x0)⋅ϵ=0.
If we want f(x1) to be larger, we need f′(x0)⋅ϵ>0.
If f′(x0) and g′(x0) are parallel, then attempting to improve f(x0+ϵ)by change g(x0+ϵ), and thereby violate the constraint g(x0+ϵ)=c.