§ The geometry of Lagrange multipliers

If we want to minise a function f(x)f(x) subject to the constraints g(x)=cg(x) = c, one uses the method of lagrange multipliers. The idea is to consider a new function L(x,λ)=f(x)+λ(cg(x))L(x, \lambda) = f(x) + \lambda (c - g(x)). Now, if one has a local maxima (x,y)(x^\star, y^\star), then the conditions:
  1. Lx=0\frac{\partial L}{\partial x} = 0: f(x)λg(x)=0f'(x^\star) - \lambda g'(x^\star) = 0.
  2. Lλ=0\frac{\partial L}{\partial \lambda} = 0: g(x)=cg(x^\star) = c.

Equation (2) is sensible: we want our optima to satisfy the constraint that we had originally imposed. What is Equation (1) trying to say? Geometrically, it's asking us to keep f(x)f'(x^\star) parallel to g(x)g'(x^\star). Why is this a good ask?
Let us say that we are at an (x0)(x_0) which is a feasible point ( g(x0)=cg(x_0) = c).
We are interested in wiggling (x0)wiggle(x0+ϵ)x1(x_0) \xrightarrow{wiggle} (x_0 + \vec\epsilon) \equiv x_1.


If f(x0)f'(x_0) and g(x0)g'(x_0) are parallel, then attempting to improve f(x0+ϵ)f(x_0 + \vec \epsilon) by change g(x0+ϵ)g(x_0 + \vec \epsilon), and thereby violate the constraint g(x0+ϵ)=cg(x_0 + \epsilon) = c.