## § Differentiating through sampling from a random normal distribution

- Credits to [Edward Eriksson) for teaching me this.
- The key idea is that since we can write the normal distribution with parameters mean $\mu$ and variance $\sigma$ as a function of the standard normal distribution. We then get to believe that the standard

- $y = f(\sigma z)$ where $z \sim N(0, 1)$.
- Then, by treating $z$ as a constant, we see that $dy/d\sigma = f'(\sigma z) \cdot z$ by chain rule.
- That is, we treat $z$ as "constant", and minimize the $\sigma$.
- My belief in this remains open until I can read a textbook, but I have it on good authority that this is correct.
- How does this relate to the VAE optimisation? It's the same trick, where we claim that $sample(N(0, 1))$ can be held constant during backprop, as if the internal structure of the $sample$function did not matter. Amazing.

```
import numpy as np
sigma = 1.0
def f (x): return np.sin(x + 0.1)
def fprime(x): return np.cos(x + 0.1)
for i in range(1000):
z = np.random.normal(0, 1)
sz = sigma * z
fx = f(sz)
gradfx = fprime(sz)
dsigma = gradfx * z
print("z = %5.2f | f = %6.2f | df = %6.2f | sigma = %6.2f | dsigma = %6.2f" %
(z, fx, gradfx, sigma, dsigma))
sigma = sigma - 0.01 * dsigma
```