## § The geometry of Lagrange multipliers

If we want to minise a function $f(x)$ subject to the constraints $g(x) = c$, one uses the method of lagrange multipliers. The idea is to consider a new function $L(x, \lambda) = f(x) + \lambda (c - g(x))$. Now, if one has a local maxima $(x^\star, y^\star)$, then the conditions:
1. $\frac{\partial L}{\partial x} = 0$: $f'(x^\star) - \lambda g'(x^\star) = 0$.
2. $\frac{\partial L}{\partial \lambda} = 0$: $g(x^\star) = c$.
Equation (2) is sensible: we want our optima to satisfy the constraint that we had originally imposed. What is Equation (1) trying to say? Geometrically, it's asking us to keep $f'(x^\star)$ parallel to $g'(x^\star)$. Why is this a good ask? Let us say that we are at an $(x_0)$ which is a feasible point ( $g(x_0) = c$). We are interested in wiggling $(x_0) \xrightarrow{wiggle} (x_0 + \vec\epsilon) \equiv x_1$.
• $x_1$ is still feasible: $g(x_1) = c = g(x_0)$.
• $x_1$ is an improvement: $f(x_1) > f(x_0)$.
• If we want $g(x_1)$ to not change, then we need $g'(x_0) \cdot \vec \epsilon = 0$.
• If we want $f(x_1)$ to be larger, we need $f'(x_0) \cdot \vec \epsilon > 0$.
If $f'(x_0)$ and $g'(x_0)$ are parallel, then attempting to improve $f(x_0 + \vec \epsilon)$ by change $g(x_0 + \vec \epsilon)$, and thereby violate the constraint $g(x_0 + \epsilon) = c$.