## § Differentiating through sampling from a random normal distribution

• Credits to [Edward Eriksson) for teaching me this.
• The key idea is that since we can write the normal distribution with parameters mean $\mu$ and variance $\sigma$ as a function of the standard normal distribution. We then get to believe that the standard
• $y = f(\sigma z)$ where $z \sim N(0, 1)$.
• Then, by treating $z$ as a constant, we see that $dy/d\sigma = f'(\sigma z) \cdot z$ by chain rule.
• That is, we treat $z$ as "constant", and minimize the $\sigma$.
• My belief in this remains open until I can read a textbook, but I have it on good authority that this is correct.
• How does this relate to the VAE optimisation? It's the same trick, where we claim that $sample(N(0, 1))$ can be held constant during backprop, as if the internal structure of the $sample$function did not matter. Amazing.
#!/usr/bin/env python3
import numpy as np

sigma = 1.0

# # function we are minimising over
# def f (x): return - x*x
# # derivative of function we are minimising over
# def fprime(x): return -2*x

# function we are minimising over
def f (x): return np.sin(x + 0.1)

# derivative of function we are minimising over
def fprime(x): return np.cos(x + 0.1)

# f(sigma z) = f'(sigma z) z.
# \partial_\sigma E[f(X_\sigma)] = E[\partial_\sigma f(X_\sigma)]
for i in range(1000):
z = np.random.normal(0, 1)
# sample from normal distribution with mean 0 and standard deviation sigma
sz = sigma * z
# evaluate function at x
fx = f(sz)