If we want to minise a function f(x) subject to the constraints g(x)=c,
one uses the method of lagrange multipliers. The idea is to consider a new
function L(x,λ)=f(x)+λ(c−g(x)). Now, if one has a local maxima
(x⋆,y⋆), then the conditions:
∂x∂L=0: f′(x⋆)−λg′(x⋆)=0.
∂λ∂L=0: g(x⋆)=c.
Equation (2) is sensible: we want our optima to satisfy the constraint that
we had originally imposed. What is Equation (1) trying to say?
Geometrically, it's asking us to keep f′(x⋆) parallel to g′(x⋆).
Why is this a good ask?
Let us say that we are at an (x0) which is a feasible point ( g(x0)=c).
We are interested in wiggling
(x0)wiggle(x0+ϵ)≡x1.
x1 is still feasible: g(x1)=c=g(x0).
x1 is an improvement: f(x1)>f(x0).
If we want g(x1) to not change, then we need g′(x0)⋅ϵ=0.
If we want f(x1) to be larger, we need f′(x0)⋅ϵ>0.
If f′(x0) and g′(x0) are parallel, then attempting to improve f(x0+ϵ)by change g(x0+ϵ), and thereby violate the constraint
g(x0+ϵ)=c.