A Universe of Sorts

§ The geometry of Lagrange multipliers

If we want to minise a function

f(x)

subject to the constraints

g(x) = c

, one uses the method of lagrange multipliers. The idea is to consider a new function

L(x, \lambda) = f(x) + \lambda (c - g(x))

. Now, if one has a local maxima

(x^\star, y^\star)

, then the conditions:

$\frac{\partial L}{\partial x} = 0$ : $f'(x^\star) - \lambda g'(x^\star) = 0$ .
$\frac{\partial L}{\partial \lambda} = 0$ : $g(x^\star) = c$ .

Equation (2) is sensible: we want our optima to satisfy the constraint that we had originally imposed. What is Equation (1) trying to say? Geometrically, it's asking us to keep

f'(x^\star)

parallel to

g'(x^\star)

. Why is this a good ask?
Let us say that we are at an

(x_0)

which is a feasible point (

g(x_0) = c

).
We are interested in wiggling

(x_0) \xrightarrow{wiggle} (x_0 + \vec\epsilon) \equiv x_1

$x_1$ is still feasible: $g(x_1) = c = g(x_0)$ .
$x_1$ is an improvement: $f(x_1) > f(x_0)$ .

If we want $g(x_1)$ to not change, then we need $g'(x_0) \cdot \vec \epsilon = 0$ .
If we want $f(x_1)$ to be larger, we need $f'(x_0) \cdot \vec \epsilon > 0$ .

f'(x_0)

and

g'(x_0)

are parallel, then attempting to improve

f(x_0 + \vec \epsilon)

by change

g(x_0 + \vec \epsilon)

, and thereby violate the constraint

g(x_0 + \epsilon) = c